US20220292801A1 - Formatting Views of Whiteboards in Conjunction with Presenters - Google Patents

Formatting Views of Whiteboards in Conjunction with Presenters Download PDF

Info

Publication number
US20220292801A1
US20220292801A1 US17/654,585 US202217654585A US2022292801A1 US 20220292801 A1 US20220292801 A1 US 20220292801A1 US 202217654585 A US202217654585 A US 202217654585A US 2022292801 A1 US2022292801 A1 US 2022292801A1
Authority
US
United States
Prior art keywords
whiteboard
talker
writing
presentation device
interactive group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/654,585
Inventor
Stephen Paul Schaefer
David A. Bryan
Rommel Gabriel Childress, Jr.
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Development Co LP
Original Assignee
Plantronics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Plantronics Inc filed Critical Plantronics Inc
Priority to US17/654,585 priority Critical patent/US20220292801A1/en
Assigned to PLANTRONICS, INC. reassignment PLANTRONICS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BRYAN, DAVID A, CHILDRESS, ROMMEL GABRIEL, JR, SCHAEFER, STEPHEN PAUL
Publication of US20220292801A1 publication Critical patent/US20220292801A1/en
Assigned to HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. reassignment HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. NUNC PRO TUNC ASSIGNMENT (SEE DOCUMENT FOR DETAILS). Assignors: PLANTRONICS, INC.
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/70Circuitry for compensating brightness variation in the scene
    • H04N23/741Circuitry for compensating brightness variation in the scene by increasing the dynamic range of the image compared to the dynamic range of the electronic image sensors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/45Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from two or more image sensors being of different type or operating in different modes, e.g. with a CMOS sensor for moving images in combination with a charge-coupled device [CCD] for still images
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/152Multipoint control units therefor

Definitions

  • This disclosure relates generally to videoconferencing and relates particularly to detection of whiteboards and individuals in one or more captured audio-visual streams.
  • whiteboards are treated primarily as content sources, so that the whiteboard is provided as a content stream.
  • a presenter is seen in a video stream, even if he moves around.
  • a camera is dedicated to a whiteboard, but then a user must switch the video source being provided to the far end between the whiteboard and the presenter. If the presenter is standing near or in front of the whiteboard, any framing with the whiteboard can become confusing.
  • the whiteboard is provided in the content stream and displayed on a content monitor, but the whiteboard is also present in the presenter video stream and the main monitor.
  • FIG. 1 is an illustration of a conference room including multiple cameras and a whiteboard according to examples of the present disclosure.
  • FIG. 2A is an illustration of a presenter separated from a whiteboard and the resulting framing according to examples of the present disclosure.
  • FIG. 2B is an illustration of a presenter standing in front of an empty whiteboard and the resulting framing according to examples of the present disclosure.
  • FIG. 2C is an illustration of a presenter standing in front of a whiteboard full of writing and the resulting framing according to examples of the present disclosure.
  • FIG. 2D is an illustration of a presenter standing in front of a whiteboard only partially filled with writing and the resulting framing according to examples of the present disclosure.
  • FIG. 3 is a high-level flowchart of framing operations according to examples of the present disclosure.
  • FIG. 4A is a flowchart of framing operations involving a presenter and a whiteboard according to examples of the present disclosure.
  • FIG. 4B is the flowchart of FIG. 4A with added delay periods.
  • FIG. 4C is the flowchart of FIG. 4A when the whiteboard is never provided as content.
  • FIG. 5 is a high-level block diagram of a videoconferencing system according to examples of the present disclosure.
  • FIG. 6 is a more detailed block diagram of the videoconferencing system of FIG. 5 according to examples of the present disclosure.
  • FIG. 7 is a block diagram of a system on a chip for use in the videoconferencing systems of FIGS. 5 and 6 .
  • a near end videoconferencing endpoint determines if there is a whiteboard and if a presenter is near the whiteboard. If there is no whiteboard in view or the presenter is not near the whiteboard, any content from a camera focused on the whiteboard is continued and any presenter framing is done normally. If the presenter is in front of the whiteboard, any whiteboard content is ended, and appropriate portions of the whiteboard are included in the main video stream framed with the presenter. If the whiteboard is empty, framing is done without reference to the whiteboard. If the whiteboard is full or has writing away from the presenter, the entire whiteboard and the presenter are framed together.
  • the whiteboard only has writing near the presenter, only the relevant portion of the whiteboard is framed with the presenter.
  • the whiteboard By including the whiteboard in the framing with the presenter and turning off any whiteboard content stream when the presenter is near the whiteboard, the far end viewer does not see the whiteboard in two different streams.
  • Conference room C is an exemplary near end location.
  • Conference room C includes a conference table 10 and a series of chairs 12 A- 12 F.
  • a whiteboard 16 is located on one wall of the conference room C.
  • a videoconferencing endpoint 498 which includes a camera 502 to view individuals seated in the various chairs 12 A-F and the whiteboard 16 and a microphone array 504 to determine speaker direction, is provided at one end of the conference room C.
  • a second camera and microphone array combination 510 is provided on one side of the conference room C and has a clearer view of the whiteboard 16 .
  • a third camera and microphone array 512 is provided on a side of the conference room C holding the whiteboard 16 .
  • a content camera 511 is mounted opposite the whiteboard 16 to capture the whiteboard 16 to provide as a content stream.
  • a monitor or television 506 is provided to display the far end conference site or sites and generally to provide the loudspeaker output.
  • FIG. 2A illustrates a presenter P separated from the whiteboard 16 .
  • the presenter P is framed normally and the content camera 511 provides the view of the whiteboard 16 as content in the videoconference. Because there is no overlap, there is no confusion by a viewer at the far end.
  • the presenter P has moved to be in front of the whiteboard 16 .
  • the framed view F 2 of the presenter P now overlaps the whiteboard 16 . Therefore, the content camera 511 is no longer providing content to avoid potential viewer confusion.
  • the framing view F 2 of the presenter P is the same size as the framing view F 1 , as the size and location are based only on the presenter P, as there is nothing on the whiteboard 16 to display.
  • the whiteboard 16 has been filled with two columns 200 and 202 of writing.
  • the presenter P has not moved from FIG. 2B .
  • the framing view F 3 includes the presenter P and the entire whiteboard 16 , as the whiteboard 16 is substantially filled with writing. No content is being provided from the content camera 511 as the presenter P is in front of the whiteboard 16 and the contents of the whiteboard 16 are provided in the framing view F 3 .
  • FIG. 2D is like FIG. 2C , except that the whiteboard 16 only contains writing in the left column 200 .
  • the right portion of the whiteboard 16 is empty. As the right portion of the whiteboard 16 is empty, that portion need not be shown in framing view F 4 , which is based on the presenter P and the left column 200 .
  • FIG. 2D in FIG. 2D no content is being provided from the content camera 511 .
  • a high-level flowchart 300 of camera framing by a near end videoconference endpoint is illustrated.
  • video streams are received from any cameras and audio streams are received from any microphone arrays.
  • regions of interest are located. Regions of interest are objects or areas that are of interest in performing framing decisions. Regions of interest include conference participants but also objects in the conference room, such as the whiteboard 16 or any object to which the participant's views may be directed.
  • neural networks are trained for face and body finding and for detecting the presence of various objects that can be regions of interest.
  • the objects include a whiteboard, including the amount and location of writing on the whiteboard.
  • the output of the neural network can include not only a bounding box for the whiteboard but also outputs related to the amount and location of writing.
  • the amount and location of the writing is determined in a second neural network to simplify the training of the neural network performing the main region of interest detection.
  • the bounding box information for the whiteboard is provided as an input to the specialized neural network to minimize the requirements of the specialized neural network.
  • the face and body finding are performed in one neural network and other regions of interest, such as the whiteboard, are detected in a different neural network, allowing simplification of each neural network and reuse of existing face and body finding neural networks.
  • the detection of any writing on the whiteboard can be performed by the region of interest detection neural network or in an additional neural network as described.
  • step 306 the audio streams from the microphone arrays are used for sound source localization (SSL), with the SSL results then used in combination with the video streams to find talkers.
  • SSL sound source localization
  • step 308 the parties are framed as desired. Framing is usually based on the locations and numbers of talkers or participants to be framed. Examples according to the present disclosure add the location of a whiteboard into the framing considerations. Details of the framing according to examples of the present disclosure are provided in FIGS. 4A, 4B and 4C .
  • FIG. 4A provides details of step 308 for examples according to the present disclosure when a whiteboard may be involved.
  • a determination is made whether the ROIs, from step 304 , include a whiteboard. If so, in step 404 a determination is made whether the presenter, the talker as determined based on SSL and video observations, is near the whiteboard. Near is effectively if the presenter is in a position that the framed view of the presenter would include the whiteboard. If not, or if there is no whiteboard ROI, in step 406 any content from a whiteboard, as from the content camera 511 , is provided as content in the videoconference. Operation proceeds to step 408 , where normal framing operations are performed, as illustrated in FIG. 2A . These normal framing operations can be rule of thirds framing, centered framing, and the like.
  • step 410 transmission of the whiteboard as content is discontinued.
  • step 412 it is determined if the whiteboard is empty. If so, the whiteboard need not be considered in framing determinations and operation proceeds to step 408 , for framing as illustrated in FIG. 2B .
  • step 414 it is determined of the whiteboard is substantially full of writing or is only in portions not adjacent the presenter. If the whiteboard is full or the writing is not adjacent the presenter, in step 416 framing is based on the presenter and the entire whiteboard, as in FIG. 2C . If the whiteboard is only partially filled and the portion is adjacent the presenter, in step 418 framing is based on the presenter and just the portion of the whiteboard containing the writing, as in FIG. 2D .
  • step 450 it is determined if the talker has been away from the whiteboard for a desired period, such as five seconds. If so, then the whiteboard is provided as content in step 408 . If the talker has been not away for the desired period, operation proceeds to step 410 , with the whiteboard remaining discontinued as content.
  • step 452 it is determined if the talker has been near the whiteboard for a desired period, such as five seconds. If so, operation proceeds to step 410 and the provision of the whiteboard as content is discontinued. If the desired period has not elapsed, operation proceeds to step 406 , where the whiteboard continues to be provided as content.
  • a desired period such as five seconds.
  • FIG. 4C Operation is similar even if the whiteboard is never provided as content, such as when there is no camera aimed at the whiteboard to operate as a content camera. This operation is shown in FIG. 4C .
  • FIG. 4C is FIG. 4A with steps 406 and 410 removed as the whiteboard is never to be provided as content.
  • a similar modification can be done to FIG. 4B to include the time delays of FIG. 4B .
  • whiteboards have been discussed above, it is understood that other objects are similar to whiteboards, so the term whiteboard as used herein is not limited to just dry erase whiteboards per se but includes similar items, such as smart or interactive whiteboards, flip charts, extra-large sticky notes, bulletin boards with paper on them, boards (including Kanban boards and scrum boards), clusters of sticky notes, a wall with a projected image from an interactive projector, etc., all of which are broadly considered as interactive group presentation devices.
  • writing is used broadly, so that other information besides the illustrated textual information, such as graphical information, pre-printed materials, etc. placed on or displayed by the whiteboard are classified as writing, all of which are broadly considered as information.
  • a content camera 511 has been described as capturing the whiteboard to be provided as content. If the whiteboard is a smart or interactive whiteboard, the whiteboard itself may be providing the content image. If the whiteboard is an image projected by an interactive projector, the projector may be providing the content image. The transmission of the content image in either case would be controlled as described in FIGS. 4A and 4B , just in cooperation with the smart whiteboard or interactive projector instead of the content camera 511 .
  • the camera with the best view of the presenter P and whiteboard 16 is used for the framing operations and then transmitted to the far end.
  • the camera would be camera 510 , absent a participant standing in front of the camera 510 .
  • the whiteboard 16 has been shown mounted on a wall, the whiteboard may also be freestanding or a portion of another object.
  • the whiteboard By including the whiteboard into presenter or talker framing decisions when the presenter is near or in front of the whiteboard, the experience of viewers at the far end is improved as confusion with provision of the whiteboard as content is reduced, particularly if the provision of whiteboard content is coordinated with the presenter framing decisions so that the whiteboard is not presented in both normal video stream and the content stream at the same time.
  • FIG. 5 illustrates an exemplary videoconferencing endpoint 498 as used at a near end or a far end according to the present disclosure.
  • a codec 500 the processing unit of the videoconferencing endpoint 498 , performs the necessary processing.
  • a camera 502 and a microphone array 504 are included in the codec 500 to form an integrated unit, such as a bar.
  • An external microphone 508 is connected to the codec 500 to be used on a conference room table.
  • Cameras 510 and 512 which include integrated microphone arrays, are connected to the codec 500 to provide alternate or additional views or video streams.
  • a content camera 511 is connected to the codec 500 to provide a content stream for use in the videoconference.
  • a television or monitor 506 including a speaker, is also connected to the codec 500 to provide video and audio output. Additional monitors can be used if desired to provide greater flexibility in displaying conference participants and conference content.
  • the codec 500 is connected to a corporate or other local area network (LAN) 514 .
  • the corporate LAN 514 is connected to a firewall 516 and then the Internet 518 in a common configuration to allow communication with a remote endpoint 634 at a far end.
  • a system on chip (SoC) 600 is the primary component of the codec 500 .
  • the SoC 600 is similar to those used for cellular telephones and handheld equipment, such as a Tegra X1 or Qualcomm 835.
  • the SoC 600 may be included as the main component on a system on module (SOM), such as nVidiaTM Jetson TX1 or IntrinsycTM Open-QTM 835 System on Module.
  • SOM system on module
  • the SoC 600 contains the CPUs 601 , DSP(s) 602 , a GPU 606 , onboard RAM 608 , a video encode and decode module 614 , an HDMI output module 616 , a camera inputs module 618 , a DRAM interface 610 , a flash memory interface and an I/O module 622 .
  • the I/O module 622 provides audio inputs and outputs, such as I2S signals; USB interfaces; an SDIO interface; PCIe interfaces; an SPI interface; an I2C interface and various general purpose I/O pins (GPIO).
  • Cameras 510 , 512 and content camera 511 are connected to the camera inputs module 618 .
  • the monitor and speaker 506 is connected to the HDMI output module 616 .
  • External DRAM 612 and a Wi-Fi/Bluetooth module 620 are connected to the SoC 600 to provide the needed bulk operating memory (RAM associated with each CPU and DSP is not shown) and additional I/O capabilities commonly used today.
  • An audio codec 624 is connected to the SoC 600 to provide local analog line level capabilities.
  • An analog microphone 508 is connected to the audio codec 624 .
  • NICs 626 , 628 are connected to the PCIe interfaces of the SoC 600 .
  • NIC 626 is for connection to the corporate LAN 514 and then to IP microphones 632 , the Internet 518 and remote or far end endpoints 634 , while the other NIC 628 is used for local connection of IP-connected devices, such as IP microphones 630 .
  • Flash memory 604 is connected to the SoC 600 to hold the programs that are executed by the CPUs 601 and DSPs 602 to provide the endpoint functionality of the codec 500 , including the whiteboard and presenter framing discussed above.
  • Illustrated modules include a video codec 650 , camera control 652 , face, body and ROI finding 653 , neural network models 655 , framing 654 , other video processing 656 , audio codec 658 , audio processing 660 , sound source localization 661 , network operations 666 , user interface 668 and operating system and various other modules 670 .
  • the RAM 608 and DRAM 612 is used for storing any of the modules in the flash memory 604 when the module is executing, storing video images of video streams and audio samples of audio streams and can be used for scratchpad operation of the SoC 600 .
  • the neural network models 855 and face, body and ROI finding 853 are used with the framing 654 to perform the whiteboard and presenter detection and framing as described above for FIGS. 3 and 4 and illustrated in FIGS. 2A-2D .
  • FIG. 7 is a block diagram of an exemplary system on a chip (SoC) 700 as can be used as the SoC 600 in the codec 500 .
  • SoC system on a chip
  • a series of more powerful microprocessors 702 such as ARM® A72 or A53 cores, form the CPUs 601 or primary general purpose processing block of the SoC 700
  • DSP digital signal processor
  • a simpler processor 706 such as ARM RSF cores, provides general control capability in the SoC 700 .
  • the more powerful microprocessors 702 , more powerful DSP 704 , less powerful DSPs 705 and simpler processor 706 each include various data and instruction caches, such as L1I, L1D, and L2D, to improve speed of operations.
  • a high speed interconnect 708 connects the microprocessors 702 , more powerful DSP 704 , simpler DSPs 705 and processors 706 to various other components in the SoC 700 .
  • a shared memory controller 710 which includes onboard memory or SRAM 608 , is connected to the high speed interconnect 708 to act as the onboard SRAM for the SoC 700 .
  • a DDR (double data rate) memory controller system 714 is connected to the high speed interconnect 708 and acts as an external interface to external DRAM memory.
  • a video acceleration module 716 and a radar processing accelerator (PAC) module 718 are similarly connected to the high speed interconnect 708 .
  • a neural network acceleration module 717 is provided for hardware acceleration of neural network operations.
  • a vision processing accelerator (VPACC) module is the video encoder/decoder 614 and is connected to the high speed interconnect 708 , as is a depth and motion PAC (DMPAC) module 722 .
  • a graphics acceleration module 724 is connected to the high speed interconnect 708 .
  • a display subsystem as the HDMI output 616 is connected to the high speed interconnect 708 to allow operation with and connection to various video monitors.
  • a system services block 732 which includes items such as DMA controllers, memory management units, general purpose I/O's, mailboxes, and the like, is provided for normal SoC 700 operation.
  • a serial connectivity module 734 is connected to the high speed interconnect 708 and includes modules as normal in an SoC.
  • a connectivity module 736 provides interconnects for external communication interfaces, such as PCIe block 738 , USB block 740 and an Ethernet switch 742 .
  • a capture/MIPI module is the camera interface 618 and includes a four lane CSI 2 compliant transmit block 746 and a four lane CSI 2 receive module and hub.
  • An MCU island 760 is provided as a secondary subsystem and handles operation of the integrated SoC 700 when the other components are powered down to save energy.
  • An MCU ARM processor 762 such as one or more ARM R5F cores, operates as a master and is coupled to the high speed interconnect 708 through an isolation interface 761 .
  • An MCU general purpose I/O (GPIO) block 764 operates as a slave.
  • MCU RAM 766 is provided to act as local memory for the MCU ARM processor 762 .
  • a CAN bus block 768 an additional external communication interface, is connected to allow operation with a conventional CAN bus environment in a vehicle.
  • An Ethernet MAC (media access control) block 770 is provided for further connectivity.
  • External memory generally non volatile memory (NVM) such as flash memory 604
  • NVM non volatile memory
  • the MCU ARM processor 762 operates as a safety processor, monitoring operations of the SoC 700 to ensure proper operation of the SoC 700 .
  • a system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions.
  • One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions.
  • One general aspect includes a method of presenting a talker and a whiteboard to a far end of a videoconference. The method also includes receiving at least one video stream containing both the talker and the whiteboard. The method also includes determining the presence of the talker near the whiteboard. The method also includes when the talker is near the whiteboard, framing the talker and the whiteboard together for provision to the far end.
  • Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
  • Implementations may include one or more of the following features.
  • the method may include determining the presence of writing on the whiteboard, and where framing the talker and the whiteboard together is performed only when there is writing on the whiteboard. Determining the presence of writing on the whiteboard includes determining that the writing only partially fills the whiteboard and the writing is adjacent to the talker, and where framing the talker and the whiteboard together frames the talker and only the portion of the whiteboard adjacent to the talker containing the writing when the writing only partially fills the whiteboard and the writing is adjacent to the talker.
  • Determining the presence of writing on the whiteboard includes determining that the writing fills the whiteboard, and where framing the talker and the whiteboard together frames the talker and the entire whiteboard when the determining the presence of writing on the whiteboard determines that the writing fills the whiteboard.
  • the method the near end environment further containing a camera for providing a view of the whiteboard as content in the videoconference the method may include: discontinuing provision of the whiteboard as content when the talker and the whiteboard are framed together.
  • the method may include continuing provision of the whiteboard as content when the talker is not near the whiteboard.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Studio Devices (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

A videoconferencing endpoint determines if there is a whiteboard and if a presenter is near the whiteboard. If there is no whiteboard in view or the presenter is not near the whiteboard, any content from a camera focused on the whiteboard is continued and any presenter framing is done normally. If the presenter is in front of the whiteboard, any whiteboard content is ended, and appropriate portions of the whiteboard are included in the main video stream framed with the presenter. If the whiteboard is empty, framing is done without reference to the whiteboard. If the whiteboard is full or has writing away from the presenter, the entire whiteboard and the presenter are framed together. If the whiteboard only has writing near the presenter, only the relevant portion of the whiteboard is framed with the presenter.

Description

    CROSS-REFERENCE
  • This application claims priority to U.S. Provisional Application Ser. No. 63/161,133, filed Mar. 15, 2021, the contents of which are incorporated herein in their entirety by reference.
  • TECHNICAL FIELD
  • This disclosure relates generally to videoconferencing and relates particularly to detection of whiteboards and individuals in one or more captured audio-visual streams.
  • BACKGROUND
  • Currently whiteboards are treated primarily as content sources, so that the whiteboard is provided as a content stream. A presenter is seen in a video stream, even if he moves around. In some cases, a camera is dedicated to a whiteboard, but then a user must switch the video source being provided to the far end between the whiteboard and the presenter. If the presenter is standing near or in front of the whiteboard, any framing with the whiteboard can become confusing. For example, the whiteboard is provided in the content stream and displayed on a content monitor, but the whiteboard is also present in the presenter video stream and the main monitor.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • For illustration, there are shown in the drawings certain examples described in the present disclosure. In the drawings, like numerals indicate like elements throughout. The full scope of the inventions disclosed herein are not limited to the precise arrangements, dimensions, and instruments shown. In the drawings:
  • FIG. 1 is an illustration of a conference room including multiple cameras and a whiteboard according to examples of the present disclosure.
  • FIG. 2A is an illustration of a presenter separated from a whiteboard and the resulting framing according to examples of the present disclosure.
  • FIG. 2B is an illustration of a presenter standing in front of an empty whiteboard and the resulting framing according to examples of the present disclosure.
  • FIG. 2C is an illustration of a presenter standing in front of a whiteboard full of writing and the resulting framing according to examples of the present disclosure.
  • FIG. 2D is an illustration of a presenter standing in front of a whiteboard only partially filled with writing and the resulting framing according to examples of the present disclosure.
  • FIG. 3 is a high-level flowchart of framing operations according to examples of the present disclosure.
  • FIG. 4A is a flowchart of framing operations involving a presenter and a whiteboard according to examples of the present disclosure.
  • FIG. 4B is the flowchart of FIG. 4A with added delay periods.
  • FIG. 4C is the flowchart of FIG. 4A when the whiteboard is never provided as content.
  • FIG. 5 is a high-level block diagram of a videoconferencing system according to examples of the present disclosure.
  • FIG. 6 is a more detailed block diagram of the videoconferencing system of FIG. 5 according to examples of the present disclosure.
  • FIG. 7 is a block diagram of a system on a chip for use in the videoconferencing systems of FIGS. 5 and 6.
  • DETAILED DESCRIPTION OF THE EMBODIMENTS
  • Far end viewer comprehension is improved in examples according to the present disclosure. A near end videoconferencing endpoint determines if there is a whiteboard and if a presenter is near the whiteboard. If there is no whiteboard in view or the presenter is not near the whiteboard, any content from a camera focused on the whiteboard is continued and any presenter framing is done normally. If the presenter is in front of the whiteboard, any whiteboard content is ended, and appropriate portions of the whiteboard are included in the main video stream framed with the presenter. If the whiteboard is empty, framing is done without reference to the whiteboard. If the whiteboard is full or has writing away from the presenter, the entire whiteboard and the presenter are framed together. If the whiteboard only has writing near the presenter, only the relevant portion of the whiteboard is framed with the presenter. By including the whiteboard in the framing with the presenter and turning off any whiteboard content stream when the presenter is near the whiteboard, the far end viewer does not see the whiteboard in two different streams.
  • Referring now to FIG. 1, a conference room C configured for use in videoconferencing is illustrated. The conference room C is an exemplary near end location. Conference room C includes a conference table 10 and a series of chairs 12A-12F. A whiteboard 16 is located on one wall of the conference room C. A videoconferencing endpoint 498, which includes a camera 502 to view individuals seated in the various chairs 12A-F and the whiteboard 16 and a microphone array 504 to determine speaker direction, is provided at one end of the conference room C. A second camera and microphone array combination 510 is provided on one side of the conference room C and has a clearer view of the whiteboard 16. A third camera and microphone array 512 is provided on a side of the conference room C holding the whiteboard 16. A content camera 511 is mounted opposite the whiteboard 16 to capture the whiteboard 16 to provide as a content stream. A monitor or television 506 is provided to display the far end conference site or sites and generally to provide the loudspeaker output.
  • FIG. 2A illustrates a presenter P separated from the whiteboard 16. As there is no overlap between a framed view F1 of the presenter P and the whiteboard 16, the presenter P is framed normally and the content camera 511 provides the view of the whiteboard 16 as content in the videoconference. Because there is no overlap, there is no confusion by a viewer at the far end.
  • In FIG. 2B, the presenter P has moved to be in front of the whiteboard 16. The framed view F2 of the presenter P now overlaps the whiteboard 16. Therefore, the content camera 511 is no longer providing content to avoid potential viewer confusion. As the whiteboard 16 is empty, the framing view F2 of the presenter P is the same size as the framing view F1, as the size and location are based only on the presenter P, as there is nothing on the whiteboard 16 to display.
  • In FIG. 2C, the whiteboard 16 has been filled with two columns 200 and 202 of writing. The presenter P has not moved from FIG. 2B. As the presenter P is in front of a whiteboard 16 containing writing, the framing view F3 includes the presenter P and the entire whiteboard 16, as the whiteboard 16 is substantially filled with writing. No content is being provided from the content camera 511 as the presenter P is in front of the whiteboard 16 and the contents of the whiteboard 16 are provided in the framing view F3.
  • FIG. 2D is like FIG. 2C, except that the whiteboard 16 only contains writing in the left column 200. The right portion of the whiteboard 16 is empty. As the right portion of the whiteboard 16 is empty, that portion need not be shown in framing view F4, which is based on the presenter P and the left column 200. As with FIG. 2C, in FIG. 2D no content is being provided from the content camera 511.
  • By including the presence of the whiteboard 16 and any writing on the whiteboard 16 into the decisions for framing the presenter P, and appropriately controlling the transmission of the whiteboard as content, viewer confusion is reduced.
  • Referring now to FIG. 3, a high-level flowchart 300 of camera framing by a near end videoconference endpoint is illustrated. In step 302, video streams are received from any cameras and audio streams are received from any microphone arrays. In step 304, regions of interest are located. Regions of interest are objects or areas that are of interest in performing framing decisions. Regions of interest include conference participants but also objects in the conference room, such as the whiteboard 16 or any object to which the participant's views may be directed. In some examples according to the present disclosure, neural networks are trained for face and body finding and for detecting the presence of various objects that can be regions of interest. In the present examples, the objects include a whiteboard, including the amount and location of writing on the whiteboard. For a whiteboard, the output of the neural network can include not only a bounding box for the whiteboard but also outputs related to the amount and location of writing. In some examples, the amount and location of the writing is determined in a second neural network to simplify the training of the neural network performing the main region of interest detection. The bounding box information for the whiteboard is provided as an input to the specialized neural network to minimize the requirements of the specialized neural network. In some examples, the face and body finding are performed in one neural network and other regions of interest, such as the whiteboard, are detected in a different neural network, allowing simplification of each neural network and reuse of existing face and body finding neural networks. In those examples, the detection of any writing on the whiteboard can be performed by the region of interest detection neural network or in an additional neural network as described.
  • In step 306, the audio streams from the microphone arrays are used for sound source localization (SSL), with the SSL results then used in combination with the video streams to find talkers. In the case of a presenter in front of a whiteboard, there is generally only a single talker to be framed.
  • After the talkers are found in step 306, in step 308 the parties are framed as desired. Framing is usually based on the locations and numbers of talkers or participants to be framed. Examples according to the present disclosure add the location of a whiteboard into the framing considerations. Details of the framing according to examples of the present disclosure are provided in FIGS. 4A, 4B and 4C.
  • FIG. 4A provides details of step 308 for examples according to the present disclosure when a whiteboard may be involved. In step 402, a determination is made whether the ROIs, from step 304, include a whiteboard. If so, in step 404 a determination is made whether the presenter, the talker as determined based on SSL and video observations, is near the whiteboard. Near is effectively if the presenter is in a position that the framed view of the presenter would include the whiteboard. If not, or if there is no whiteboard ROI, in step 406 any content from a whiteboard, as from the content camera 511, is provided as content in the videoconference. Operation proceeds to step 408, where normal framing operations are performed, as illustrated in FIG. 2A. These normal framing operations can be rule of thirds framing, centered framing, and the like.
  • If the presenter is near the whiteboard in step 404, in step 410 transmission of the whiteboard as content is discontinued. In step 412, it is determined if the whiteboard is empty. If so, the whiteboard need not be considered in framing determinations and operation proceeds to step 408, for framing as illustrated in FIG. 2B.
  • If the whiteboard is not empty, in step 414 it is determined of the whiteboard is substantially full of writing or is only in portions not adjacent the presenter. If the whiteboard is full or the writing is not adjacent the presenter, in step 416 framing is based on the presenter and the entire whiteboard, as in FIG. 2C. If the whiteboard is only partially filled and the portion is adjacent the presenter, in step 418 framing is based on the presenter and just the portion of the whiteboard containing the writing, as in FIG. 2D.
  • In a simplified example, there is no evaluation of the amount or location of any writing on the whiteboard and the presenter is simply framed with the entire whiteboard when the presenter is near the whiteboard, so that the framing is as shown in FIG. 2C even if there is no writing or the writing is adjacent the presenter.
  • If the presenter is pacing, so that the whiteboard comes into and out of a framing view of the presenter, a situation might arise where the whiteboard content stream is rapidly and repeatedly turned on and off. This would be distracting to the viewer at the far end, so in some examples time delays are included after the determination of step 404 as shown in FIG. 4B. If the talker is not near or away the whiteboard in step 404, in step 450 it is determined if the talker has been away from the whiteboard for a desired period, such as five seconds. If so, then the whiteboard is provided as content in step 408. If the talker has been not away for the desired period, operation proceeds to step 410, with the whiteboard remaining discontinued as content.
  • If the talker is near the whiteboard in step 404, in step 452 it is determined if the talker has been near the whiteboard for a desired period, such as five seconds. If so, operation proceeds to step 410 and the provision of the whiteboard as content is discontinued. If the desired period has not elapsed, operation proceeds to step 406, where the whiteboard continues to be provided as content.
  • Operation is similar even if the whiteboard is never provided as content, such as when there is no camera aimed at the whiteboard to operate as a content camera. This operation is shown in FIG. 4C. FIG. 4C is FIG. 4A with steps 406 and 410 removed as the whiteboard is never to be provided as content. A similar modification can be done to FIG. 4B to include the time delays of FIG. 4B.
  • While whiteboards have been discussed above, it is understood that other objects are similar to whiteboards, so the term whiteboard as used herein is not limited to just dry erase whiteboards per se but includes similar items, such as smart or interactive whiteboards, flip charts, extra-large sticky notes, bulletin boards with paper on them, boards (including Kanban boards and scrum boards), clusters of sticky notes, a wall with a projected image from an interactive projector, etc., all of which are broadly considered as interactive group presentation devices.
  • While writing on the whiteboard has been discussed above, it is understood that writing is used broadly, so that other information besides the illustrated textual information, such as graphical information, pre-printed materials, etc. placed on or displayed by the whiteboard are classified as writing, all of which are broadly considered as information.
  • In the examples of this disclosure, a content camera 511 has been described as capturing the whiteboard to be provided as content. If the whiteboard is a smart or interactive whiteboard, the whiteboard itself may be providing the content image. If the whiteboard is an image projected by an interactive projector, the projector may be providing the content image. The transmission of the content image in either case would be controlled as described in FIGS. 4A and 4B, just in cooperation with the smart whiteboard or interactive projector instead of the content camera 511.
  • While the use of neural networks has been described to determine the presence of a whiteboard and the amount of writing on a whiteboard, it is understood that more conventional computer vision techniques can also be used.
  • In examples according to the present disclosure, the camera with the best view of the presenter P and whiteboard 16 is used for the framing operations and then transmitted to the far end. For example, in FIG. 1 that camera would be camera 510, absent a participant standing in front of the camera 510.
  • While this disclosure has focused on the use of a whiteboard in a conference room, it is understood that the whiteboard and presenter may be in many different settings, including a classroom, an auditorium, a lecture hall, a theater and so on.
  • Additionally, while the whiteboard 16 has been shown mounted on a wall, the whiteboard may also be freestanding or a portion of another object.
  • By including the whiteboard into presenter or talker framing decisions when the presenter is near or in front of the whiteboard, the experience of viewers at the far end is improved as confusion with provision of the whiteboard as content is reduced, particularly if the provision of whiteboard content is coordinated with the presenter framing decisions so that the whiteboard is not presented in both normal video stream and the content stream at the same time.
  • FIG. 5 illustrates an exemplary videoconferencing endpoint 498 as used at a near end or a far end according to the present disclosure. A codec 500, the processing unit of the videoconferencing endpoint 498, performs the necessary processing. In the illustrated example, a camera 502 and a microphone array 504 are included in the codec 500 to form an integrated unit, such as a bar. An external microphone 508 is connected to the codec 500 to be used on a conference room table. Cameras 510 and 512, which include integrated microphone arrays, are connected to the codec 500 to provide alternate or additional views or video streams. A content camera 511 is connected to the codec 500 to provide a content stream for use in the videoconference. A television or monitor 506, including a speaker, is also connected to the codec 500 to provide video and audio output. Additional monitors can be used if desired to provide greater flexibility in displaying conference participants and conference content.
  • The codec 500 is connected to a corporate or other local area network (LAN) 514. The corporate LAN 514 is connected to a firewall 516 and then the Internet 518 in a common configuration to allow communication with a remote endpoint 634 at a far end.
  • Details of the codec 500 are shown in FIG. 6. In the illustrated example, a system on chip (SoC) 600 is the primary component of the codec 500. The SoC 600 is similar to those used for cellular telephones and handheld equipment, such as a Tegra X1 or Qualcomm 835. The SoC 600 may be included as the main component on a system on module (SOM), such as nVidia™ Jetson TX1 or Intrinsyc™ Open-Q™ 835 System on Module. The SoC 600 contains the CPUs 601, DSP(s) 602, a GPU 606, onboard RAM 608, a video encode and decode module 614, an HDMI output module 616, a camera inputs module 618, a DRAM interface 610, a flash memory interface and an I/O module 622. The I/O module 622 provides audio inputs and outputs, such as I2S signals; USB interfaces; an SDIO interface; PCIe interfaces; an SPI interface; an I2C interface and various general purpose I/O pins (GPIO).
  • Cameras 510, 512 and content camera 511 are connected to the camera inputs module 618. The monitor and speaker 506 is connected to the HDMI output module 616. External DRAM 612 and a Wi-Fi/Bluetooth module 620 are connected to the SoC 600 to provide the needed bulk operating memory (RAM associated with each CPU and DSP is not shown) and additional I/O capabilities commonly used today. An audio codec 624 is connected to the SoC 600 to provide local analog line level capabilities. An analog microphone 508 is connected to the audio codec 624.
  • Preferably two network interface chips (NICs) 626, 628, such as Intel I210, are connected to the PCIe interfaces of the SoC 600. In the illustrated embodiment, NIC 626 is for connection to the corporate LAN 514 and then to IP microphones 632, the Internet 518 and remote or far end endpoints 634, while the other NIC 628 is used for local connection of IP-connected devices, such as IP microphones 630.
  • Flash memory 604 is connected to the SoC 600 to hold the programs that are executed by the CPUs 601 and DSPs 602 to provide the endpoint functionality of the codec 500, including the whiteboard and presenter framing discussed above. Illustrated modules include a video codec 650, camera control 652, face, body and ROI finding 653, neural network models 655, framing 654, other video processing 656, audio codec 658, audio processing 660, sound source localization 661, network operations 666, user interface 668 and operating system and various other modules 670. The RAM 608 and DRAM 612 is used for storing any of the modules in the flash memory 604 when the module is executing, storing video images of video streams and audio samples of audio streams and can be used for scratchpad operation of the SoC 600. The neural network models 855 and face, body and ROI finding 853 are used with the framing 654 to perform the whiteboard and presenter detection and framing as described above for FIGS. 3 and 4 and illustrated in FIGS. 2A-2D.
  • FIG. 7 is a block diagram of an exemplary system on a chip (SoC) 700 as can be used as the SoC 600 in the codec 500. A series of more powerful microprocessors 702, such as ARM® A72 or A53 cores, form the CPUs 601 or primary general purpose processing block of the SoC 700, while a more powerful digital signal processor (DSP) 704 and multiple less powerful DSPs 705, together the DSPs 602, provide specialized computing capabilities. A simpler processor 706, such as ARM RSF cores, provides general control capability in the SoC 700. The more powerful microprocessors 702, more powerful DSP 704, less powerful DSPs 705 and simpler processor 706 each include various data and instruction caches, such as L1I, L1D, and L2D, to improve speed of operations. A high speed interconnect 708 connects the microprocessors 702, more powerful DSP 704, simpler DSPs 705 and processors 706 to various other components in the SoC 700. For example, a shared memory controller 710, which includes onboard memory or SRAM 608, is connected to the high speed interconnect 708 to act as the onboard SRAM for the SoC 700. A DDR (double data rate) memory controller system 714 is connected to the high speed interconnect 708 and acts as an external interface to external DRAM memory. A video acceleration module 716 and a radar processing accelerator (PAC) module 718 are similarly connected to the high speed interconnect 708. A neural network acceleration module 717 is provided for hardware acceleration of neural network operations. A vision processing accelerator (VPACC) module is the video encoder/decoder 614 and is connected to the high speed interconnect 708, as is a depth and motion PAC (DMPAC) module 722.
  • A graphics acceleration module 724 is connected to the high speed interconnect 708. A display subsystem as the HDMI output 616 is connected to the high speed interconnect 708 to allow operation with and connection to various video monitors. A system services block 732, which includes items such as DMA controllers, memory management units, general purpose I/O's, mailboxes, and the like, is provided for normal SoC 700 operation. A serial connectivity module 734 is connected to the high speed interconnect 708 and includes modules as normal in an SoC. A connectivity module 736 provides interconnects for external communication interfaces, such as PCIe block 738, USB block 740 and an Ethernet switch 742. A capture/MIPI module is the camera interface 618 and includes a four lane CSI 2 compliant transmit block 746 and a four lane CSI 2 receive module and hub.
  • An MCU island 760 is provided as a secondary subsystem and handles operation of the integrated SoC 700 when the other components are powered down to save energy. An MCU ARM processor 762, such as one or more ARM R5F cores, operates as a master and is coupled to the high speed interconnect 708 through an isolation interface 761. An MCU general purpose I/O (GPIO) block 764 operates as a slave. MCU RAM 766 is provided to act as local memory for the MCU ARM processor 762. A CAN bus block 768, an additional external communication interface, is connected to allow operation with a conventional CAN bus environment in a vehicle. An Ethernet MAC (media access control) block 770 is provided for further connectivity. External memory, generally non volatile memory (NVM) such as flash memory 604, is connected to the MCU ARM processor 762 via an external memory interface 769 to store instructions loaded into the various other memories for execution by the various appropriate processors. The MCU ARM processor 762 operates as a safety processor, monitoring operations of the SoC 700 to ensure proper operation of the SoC 700.
  • It is understood that this is one example of an SoC provided for explanation and many other SoC examples are possible, with varying numbers of processors, DSPs, accelerators and the like.
  • A system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions. One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions. One general aspect includes a method of presenting a talker and a whiteboard to a far end of a videoconference. The method also includes receiving at least one video stream containing both the talker and the whiteboard. The method also includes determining the presence of the talker near the whiteboard. The method also includes when the talker is near the whiteboard, framing the talker and the whiteboard together for provision to the far end. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
  • Implementations may include one or more of the following features. The method may include determining the presence of writing on the whiteboard, and where framing the talker and the whiteboard together is performed only when there is writing on the whiteboard. Determining the presence of writing on the whiteboard includes determining that the writing only partially fills the whiteboard and the writing is adjacent to the talker, and where framing the talker and the whiteboard together frames the talker and only the portion of the whiteboard adjacent to the talker containing the writing when the writing only partially fills the whiteboard and the writing is adjacent to the talker. Determining the presence of writing on the whiteboard includes determining that the writing fills the whiteboard, and where framing the talker and the whiteboard together frames the talker and the entire whiteboard when the determining the presence of writing on the whiteboard determines that the writing fills the whiteboard. The method the near end environment further containing a camera for providing a view of the whiteboard as content in the videoconference, the method may include: discontinuing provision of the whiteboard as content when the talker and the whiteboard are framed together. The method may include continuing provision of the whiteboard as content when the talker is not near the whiteboard. Determining the presence of the talker near the whiteboard includes detecting regions of interest in the at least one video stream; and determining if a region of interest is a whiteboard. Implementations of the described techniques may include hardware, a method or process, or computer software on a computer-accessible medium.
  • The above description is intended to be illustrative, and not restrictive. For example, the above-described embodiments may be used in combination with each other. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of the invention should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein.”

Claims (20)

1. A method of presenting a talker and a whiteboard to a far end of a videoconference, the near end environment containing the talker, the whiteboard and at least one video camera for providing a video stream to the far end, the at least one video camera having both the talker and the whiteboard within its field of view, the method comprising:
receiving at least one video stream containing both the talker and the whiteboard;
determining the presence of the talker near the whiteboard; and
when the talker is near the whiteboard, framing the talker and the whiteboard together for provision to the far end.
2. The method of claim 1, further comprising:
determining the presence of writing on the whiteboard, and
wherein framing the talker and the whiteboard together is performed only when there is writing on the whiteboard.
3. The method of claim 2, wherein determining the presence of writing on the whiteboard includes determining that the writing only partially fills the whiteboard and the writing is adjacent to the talker, and
wherein framing the talker and the whiteboard together frames the talker and only the portion of the whiteboard adjacent to the talker containing the writing when the writing only partially fills the whiteboard and the writing is adjacent to the talker.
4. The method of claim 2, wherein determining the presence of writing on the whiteboard includes determining that the writing fills the whiteboard, and
wherein framing the talker and the whiteboard together frames the talker and the entire whiteboard when the determining the presence of writing on the whiteboard determines that the writing fills the whiteboard.
5. The method of claim 1, the near end environment further containing a camera for providing a view of the whiteboard as content in the videoconference, the method further comprising:
discontinuing provision of the whiteboard as content when the talker and the whiteboard are framed together.
6. The method of claim 5, further comprising:
continuing provision of the whiteboard as content when the talker is not near the whiteboard.
7. The method of claim 1, wherein determining the presence of the talker near the whiteboard includes:
detecting regions of interest in the at least one video stream; and
determining if a region of interest is a whiteboard.
8. A videoconference endpoint for use in a near end environment containing a talker, an interactive group presentation device and at least one video camera for providing a video stream to a far end videoconference endpoint, the at least one video camera having both the talker and the interactive group presentation device within its field of view, comprising:
a processor;
a network interface coupled to the processor for connection to a far end videoconference endpoint;
a camera interface coupled to the processor for receiving at least one video stream having both the talker and the interactive group presentation device;
a video output interface coupled to the processor for providing a video stream to a display for presentation; and
memory coupled to the processor for storing instructions executed by the processor to perform the operations of:
receiving at least one video stream containing both the talker and the interactive group presentation device;
determining the presence of the talker near the interactive group presentation device; and
when the talker is near the interactive group presentation device, framing the talker and the interactive group presentation device together for provision to the far end.
9. The videoconference endpoint of claim 8, the memory further storing instructions executed by the processor to perform the operations of:
determining the presence of information on the interactive group presentation device, and
wherein framing the talker and the interactive group presentation device together is performed only when there is information on the interactive group presentation device.
10. The videoconference endpoint of claim 9, wherein determining the presence of information on the interactive group presentation device includes determining that the information only partially fills the interactive group presentation device and the information is adjacent to the talker, and
wherein framing the talker and the interactive group presentation device together frames the talker and only the portion of the interactive group presentation device adjacent to the talker containing the information when the information only partially fills the interactive group presentation device and the information is adjacent to the talker.
11. The videoconference endpoint of claim 9, wherein determining the presence of information on the interactive group presentation device includes determining that the information fills the interactive group presentation device, and
wherein framing the talker and the interactive group presentation device together frames the talker and the entire interactive group presentation device when the determining the presence of information on the interactive group presentation device determines that the information fills the interactive group presentation device.
12. The videoconference endpoint of claim 8, the near end environment further containing a camera for providing a view of the interactive group presentation device as content in the videoconference, the memory further storing instructions executed by the processor to perform the operations of:
discontinuing provision of the interactive group presentation device as content when the talker and the interactive group presentation device are framed together.
13. The videoconference endpoint of claim 12, the memory further storing instructions executed by the processor to perform the operations of:
continuing provision of the interactive group presentation device as content when the talker is not near the interactive group presentation device.
14. The videoconference endpoint of claim 8, wherein determining the presence of the talker near the interactive group presentation device includes:
detecting regions of interest in the at least one video stream; and
determining if a region of interest is an interactive group presentation device.
15. A non-transitory processor readable memory containing instructions that when executed cause a processor or processors to perform the following method of framing a talker, the near end environment containing a talker, a whiteboard and at least one video camera for providing a video stream to a far end, the at least one video camera having both the talker and the whiteboard within its field of view, the method comprising:
receiving at least one video stream containing both the talker and the whiteboard;
determining the presence of the talker near the whiteboard; and
when the talker is near the whiteboard, framing the talker and the whiteboard together for provision to the far end.
16. The non-transitory processor readable memory of claim 15, the method further comprising:
determining the presence of writing on the whiteboard, and
wherein framing the talker and the whiteboard together is performed only when there is writing on the whiteboard.
17. The non-transitory processor readable memory of claim 16, wherein determining the presence of writing on the whiteboard includes determining that the writing only partially fills the whiteboard and the writing is adjacent to the talker, and
wherein framing the talker and the whiteboard together frames the talker and only the portion of the whiteboard adjacent to the talker containing the writing when the writing only partially fills the whiteboard and the writing is adjacent to the talker.
18. The non-transitory processor readable memory of claim 16, wherein determining the presence of writing on the whiteboard includes determining that the writing fills the whiteboard, and
wherein framing the talker and the whiteboard together frames the talker and the entire whiteboard when the determining the presence of writing on the whiteboard determines that the writing fills the whiteboard.
19. The non-transitory processor readable memory of claim 15, the near end environment further containing a camera for providing a view of the whiteboard as content in the videoconference, the method further comprising:
discontinuing provision of the whiteboard as content when the talker and the whiteboard are framed together.
20. The non-transitory processor readable memory of claim 19, the method further comprising:
continuing provision of the whiteboard as content when the talker is not near the whiteboard.
US17/654,585 2021-03-15 2022-03-12 Formatting Views of Whiteboards in Conjunction with Presenters Pending US20220292801A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/654,585 US20220292801A1 (en) 2021-03-15 2022-03-12 Formatting Views of Whiteboards in Conjunction with Presenters

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163161133P 2021-03-15 2021-03-15
US17/654,585 US20220292801A1 (en) 2021-03-15 2022-03-12 Formatting Views of Whiteboards in Conjunction with Presenters

Publications (1)

Publication Number Publication Date
US20220292801A1 true US20220292801A1 (en) 2022-09-15

Family

ID=83195054

Family Applications (2)

Application Number Title Priority Date Filing Date
US17/654,585 Pending US20220292801A1 (en) 2021-03-15 2022-03-12 Formatting Views of Whiteboards in Conjunction with Presenters
US17/659,895 Active US11696038B2 (en) 2021-03-15 2022-04-20 Multiple camera color balancing

Family Applications After (1)

Application Number Title Priority Date Filing Date
US17/659,895 Active US11696038B2 (en) 2021-03-15 2022-04-20 Multiple camera color balancing

Country Status (1)

Country Link
US (2) US20220292801A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019229891A1 (en) * 2018-05-30 2019-12-05 株式会社ニコンビジョン Optical detection device and method, and distance measurement device and method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2485887A1 (en) * 2004-10-25 2006-04-25 Athentech Technologies Inc. Adjustment of multiple data channels using relative strength histograms
EP1966648A4 (en) * 2005-12-30 2011-06-15 Nokia Corp Method and device for controlling auto focusing of a video camera by tracking a region-of-interest
JP4864835B2 (en) * 2007-08-21 2012-02-01 Kddi株式会社 Color correction apparatus, method and program
JP5379647B2 (en) * 2009-10-29 2013-12-25 オリンパス株式会社 Imaging apparatus and image generation method
JP2017139678A (en) * 2016-02-05 2017-08-10 Necプラットフォームズ株式会社 Image data converter, image data conversion method, image data conversion program, pos terminal, and server
JP6925816B2 (en) * 2017-02-09 2021-08-25 株式会社小松製作所 Position measurement system, work machine, and position measurement method

Also Published As

Publication number Publication date
US11696038B2 (en) 2023-07-04
US20220294969A1 (en) 2022-09-15

Similar Documents

Publication Publication Date Title
US10917612B2 (en) Multiple simultaneous framing alternatives using speaker tracking
US9462227B2 (en) Automatic video layouts for multi-stream multi-site presence conferencing system
US9466222B2 (en) System and method for hybrid course instruction
US20210409646A1 (en) Apparatus for video communication
US9232185B2 (en) Audio conferencing system for all-in-one displays
US20130162752A1 (en) Audio and Video Teleconferencing Using Voiceprints and Face Prints
US7694027B2 (en) System and method for peripheral communication with an information handling system
EP2348671A1 (en) Conference terminal, conference server, conference system and method for data processing
CN110333837B (en) Conference system, communication method and device
US20230283888A1 (en) Processing method and electronic device
CN108924469B (en) Display picture switching transmission system, intelligent interactive panel and method
US20220292801A1 (en) Formatting Views of Whiteboards in Conjunction with Presenters
CN111163280A (en) Asymmetric video conference system and method thereof
KR102168948B1 (en) Mobile video control system and and operation method thereof
US11937057B2 (en) Face detection guided sound source localization pan angle post processing for smart camera talker tracking and framing
US20230135996A1 (en) Automatically determining the proper framing and spacing for a moving presenter
KR20080087267A (en) Transmission system of interactive video and audio
TW202244914A (en) Data sharing method and data sharing system
Rui et al. PING: A Group-to-individual distributed meeting system
Zhang Multimodal collaboration and human-computer interaction

Legal Events

Date Code Title Description
AS Assignment

Owner name: PLANTRONICS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SCHAEFER, STEPHEN PAUL;BRYAN, DAVID A;CHILDRESS, ROMMEL GABRIEL, JR;REEL/FRAME:059248/0024

Effective date: 20220311

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

AS Assignment

Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS

Free format text: NUNC PRO TUNC ASSIGNMENT;ASSIGNOR:PLANTRONICS, INC.;REEL/FRAME:065549/0065

Effective date: 20231009