WO2023212171A1

WO2023212171A1 - Systems and methods for computer vision

Info

Publication number: WO2023212171A1
Application number: PCT/US2023/020165
Authority: WO
Inventors: Chaman Singh VERMA; Qiaogan WANG; Hua BAO; Qigong ZHENG; Shivakumar Mahadevappa; Subbu KUNAPULI; Vikram Khurana; Kongfeng Berger
Original assignee: Avail Medsystems, Inc.
Priority date: 2022-04-27
Filing date: 2023-04-27
Publication date: 2023-11-02

Abstract

The present disclosure provides a system for computer vision. The system may comprise one or more imaging devices and an image processing module operatively coupled to the one or more imaging devices. The image processing module may be configured to process one or more images or videos obtained using the one or more imaging devices to generate an image or video montage comprising a plurality of tiles.

Description

SYSTEMS AND METHODS FOR COMPUTER VISION

CROSS-REFERENCE

[0001] This application claims priority to U.S. Provisional Patent Application No. 63/335,517 filed on April 27, 2022, which application is incorporated herein by reference in its entirety for all purposes.

BACKGROUND

[0002] Medical practitioners may perform various procedures within a medical suite, such as an operating room. Videos and/or images of the procedure may be captured and processed or analyzed to derive meaningful information about the procedure and/or the practitioner.

SUMMARY

[0003] The present disclosure provides systems and methods for computer vision. As used herein, computer vision may refer to the use of computing systems to interpret visual information or data (e.g., video data or image data). The video data or image data may comprise, for example, video sequences, views from multiple cameras, multi-dimensional data from a 3D scanner, or medical data obtained using a medical scanning or imaging device. In some cases, computer vision may involve using a trained algorithm or a neural network to process, analyze, and/or interpret video or image data. The computer vision systems described herein may utilize artificial intelligence and/or machine learning to derive actionable insights for medical practitioners and/or other entities having a stake or interest in a procedure being performed by the medical practitioner.

[0004] In one aspect, the present disclosure provides a system for computer vision. The system may comprise one or more imaging devices and an image processing module operatively coupled to the one or more imaging devices. In some embodiments, the image processing module is configured to process one or more images or videos obtained using the one or more imaging devices to generate an image or video montage comprising a plurality of tiles.

[0005] In some embodiments, each tile of the plurality of tiles comprises a view of a portion of a target scene of interest. In some embodiments, the plurality of tiles is arranged in a predetermined or user-defined layout or configuration.

[0006] In some embodiments, the image processing module is configured to perform auto cropping, zooming, and/or panning of the one or more images or videos. In some embodiments, the image processing module is configured to automatically crop the one or more images or videos. In some embodiments, the image processing module is configured to automatically crop the one or more images or videos by generating a bounding box around one or more visually salient features in the one or more images or videos. For example, the visually salient features may comprise faces, text, or other prominent objects of interests in the image. In some embodiments, the image processing module is configured to automatically crop the one or more images or videos by removing one or more undesirable regions or portions of the one or more images or videos. In some embodiments, the one or more undesirable regions or portions comprise a black region that does not provide or comprise data or information about a subject or a target scene or object of interest. By automatically cropping the images or videos, the system may improve the efficiency and speed of processing, while may ensure that the relevant information is retained. Additionally or alternatively, autocropping may be useful in cases where images or videos need to be resized or reformatted for specific applications or devices, as it may allow for precise and automated cropping based on the most relevant information in the image or video,

[0007] In some embodiments, the image processing module is configured to automatically crop the one or more images or videos to focus on or maximize a view of one or more regions or features of interest. In some embodiments, the image processing module is configured to automatically crop the one or more images or videos by utilizing one or more sweeping lines. In some embodiments, the image processing module is configured to use the one or more sweeping lines to (i) scan the one or more images or videos along one or more directions and (ii) identify or detect a change in a pixel value as the one or more sweeping lines scan along the one or more directions. In some embodiments, the change in the pixel value corresponds to a change from a first pixel value to a second pixel value. In some embodiments, the first pixel value is zero. In some embodiments, the second pixel value is non-zero. In some embodiments, the first pixel value corresponds to a black region of the one or more images or videos.

[0008] In some embodiments, the image processing module is configured to automatically crop the one or more images or videos using edge detection. In some embodiments, the edge detection involves detecting when a pixel value changes from a first pixel value to a second pixel value.

[0009] In some embodiments, the image processing module is configured to automatically crop the one or more images or videos as the one or more images or videos are obtained, captured, or received. In some embodiments, the image processing module is configured to crop the one or more images or videos at a predetermined time or frequency. In some embodiments, the predetermined time or frequency is set by a user.

[0010] In some embodiments, the image processing module is configured to generate an auto cropped image or video. In some embodiments, the auto cropped image or video provides an enlarged or magnified view of a region of interest that does not include excessive black borders surrounding the region of interest.

[0011] In some embodiments, the plurality of tiles of the image or video montage is arranged in an M x N array. In some embodiments, each tile of the plurality of tiles is associated with a medical imaging or camera input. In some embodiments, each tile of the plurality of tiles is associated with a different input source. In some embodiments, each tile of the plurality of tiles is associated with a same input source.

[0012] In some embodiments, the image or video montage is configurable to allow a remote user to select one or more regions of interest for the medical imaging or camera input by manually drawing a bounding box. In some embodiments, the image processing unit is configured to automatically detect one or more regions of interest for the medical imaging or camera input.

[0013] In some embodiments, the image or video montage is configurable to allow one or more remote participants or specialists to freeze one or more individual tiles and to draw on or telestrate an image or video associated with the individual tiles to indicate a region, object, or feature of interest. In some embodiments, each tile of the plurality of tiles is from a same source. In some embodiments, each tile of the plurality of tiles is from a different source. In some embodiments, one or more frames of the plurality of tiles are freezable to allow a remote specialist to draw on the one or more frames and/or to indicate a region, object, or feature of interest within the one or more frames. In some embodiments, different tiles of the plurality of tiles are associated with different timelines of a same input source.

[0014] In some embodiments, the image processing unit is configured to perform or apply pan, tilt, and/or zoom on or to the one or more imaging devices. In some embodiments, the one or more imaging devices comprise a medical imaging device or a camera. In some embodiments, the pan, tilt, and/or zoom is enabled or controlled using hardware and/or one or more physical components. In some embodiments, the pan, tilt, and/or zoom is virtualized and/or enabled or controlled using software. In some embodiments, the virtualized pan, tilt, and/or zoom is enabled using a scaling factor that is generated dynamically based on an input resolution of the images or videos. In some embodiments, the scaling factor is customizable by a user input and/or controllable via a cloud server or a network. In some embodiments, the virtualized pan, tilt, and/or zoom is controllable using one or more virtualized pan, tilt, or zoom controls applicable to a two-dimensional (2D), three-dimensional (3D), or four-dimensional (4D) input.

[0015] In some embodiments, the image processing unit is configured to implement image interpolation or image enhancement techniques to improve or enhance an image quality of the one or more images or videos In some embodiments, an image quality of an image or a video associated with each tile of the plurality of tiles is controllable or adjustable by a user on an individual tile basis. In some embodiments, one or more image quality improvements are applied per tile. In some embodiments, one or more image quality improvements are applied to the entire montage. In some embodiments, the image processing unit is configured to apply image quality improvements to the one or more images or videos to compensate for or to address one or more undesirable lighting conditions. In some embodiments, the image quality improvements comprise HDR, SDR, and/or WDR processing. In some embodiments, the one or more undesirable lighting conditions comprise image washout or imaging bleaching effects. In some embodiments, the image quality improvements are applied individually per tile. In some embodiments, the image quality improvements are applied to the entire montage.

[0016] In some embodiments, one or more tiles of the plurality of tiles are adjustable or customizable. In some embodiments, a size of the one or more tiles is adjustable or customizable. In some embodiments, a position or an orientation of the one or more tiles is adjustable or customizable. In some embodiments, the one or more tiles are adjustable or customizable based on a user input or a control command provided the user.

[0017] Another aspect of the present disclosure provides a non-transitory computer readable medium comprising machine executable code that, upon execution by one or more computer processors, implements any of the methods above or elsewhere herein.

[0018] Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto. The computer memory comprises machine executable code that, upon execution by the one or more computer processors, implements any of the methods above or elsewhere herein.

[0019] Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

INCORPORATION BY REFERENCE

[0020] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.

BRIEF DESCRIPTION OF THE DRAWINGS

[0021] The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also “Figure” and “FIG.” herein), of which:

[0022] FIG. 1 schematically illustrates an exemplary configuration for a computer vision system, in accordance with some embodiments.

[0023] FIGs. 2A and 2B schematically illustrate a method for auto cropping to remove black regions around an image or a video, in accordance with some embodiments.

[0024] FIGs. 3A and 3B schematically illustrate auto cropping of images, in accordance with some embodiments.

[0025] FIG. 4 and FIG. 5 schematically illustrate various examples of layouts that can be used to present or display the image or video collage or montage to a user, in accordance with some embodiments.

[0026] FIG. 6 schematically illustrates a plurality of image viewing modes that can be selected by a user, in accordance with some embodiments.

[0027] FIG. 7 schematically illustrates a zoom in and a zoom out operation that can be performed when the user is operating the system with the zoom mode selected, in accordance with some embodiments.

[0028] FIG. 8 schematically illustrates a panning operation that can be performed when the user is operating the system with the pan mode selected, in accordance with some embodiments.

[0029] FIG. 9 schematically illustrates an image bounding box and a tile bounding box that can intersect to define a region of interest, in accordance with some embodiments

[0030] FIG. 10 schematically illustrates adjusting the relative position of an image compared to a field of view of a tile, in accordance with some embodiments.

[0031] FIG. 11 schematically illustrates an exemplary method for computer vision, in accordance with some embodiments.

[0032] FIG. 12 schematically illustrates a computer system programmed or otherwise configured to implement a method for computer vision, in accordance with some embodiments. DETAILED DESCRIPTION

[0033] While various embodiments of the invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.

[0034] Whenever the term “at least,” “greater than,” or “greater than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “at least,” “greater than” or “greater than or equal to” applies to each of the numerical values in that series of numerical values. For example, greater than or equal to 1, 2, or 3 is equivalent to greater than or equal to 1, greater than or equal to 2, or greater than or equal to 3.

[0035] Whenever the term “no more than,” “less than,” or “less than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “no more than,” “less than,” or “less than or equal to” applies to each of the numerical values in that series of numerical values. For example, less than or equal to 3, 2, or 1 is equivalent to less than or equal to 3, less than or equal to 2, or less than or equal to 1.

[0036] The term “real time” or “real-time,” as used interchangeably herein, generally refers to an event (e.g., an operation, a process, a method, a technique, a computation, a calculation, an analysis, a visualization, an optimization, etc.) that is performed using recently obtained (e.g., collected or received) data. In some cases, a real time event may be performed almost immediately or within a short enough time span, such as within at least 0.0001 millisecond (ms), 0.0005 ms, 0.001 ms, 0.005 ms, 0.01 ms, 0.05 ms, 0.1 ms, 0.5 ms, 1 ms, 5 ms, 0.01 seconds, 0.05 seconds, 0.1 seconds, 0.5 seconds, 1 second, or more. In some cases, a real time event may be performed almost immediately or within a short enough time span, such as within at most 1 second, 0.5 seconds, 0.1 seconds, 0.05 seconds, 0.01 seconds, 5 ms, 1 ms, 0.5 ms, 0.1 ms, 0.05 ms, 0.01 ms, 0.005 ms, 0.001 ms, 0.0005 ms, 0.0001 ms, or less.

[0037] Computer Vision

[0038] The present disclosure provides systems and methods for computer vision. FIG. 1 illustrates an exemplary configuration for a computer vision system. The system may comprise one or more imaging devices. The one or more imaging devices may comprise a camera or an imaging sensor.

[0039] The imaging devices may be coupled to a frame processing unit. The frame processing unit may read in an image or video frame from the imaging devices and place the frame into a frame buffer. The unit may then signal or wake up a source/tile thread. The source/tile thread may read frames out of the frame buffer and perform video processing such as, for example, resizing, cropping, text detection, face detection, etc. The thread may then put the processed frames into a montage frame buffer. In some cases, there may be a separate source/tile thread for each input imaging device channel. By coupling imaging devices to a frame processing unit, this system provided herein may efficiently read in images and video frames and process them through a variety of video processing techniques such as resizing, cropping, text detection, and face detection. This may allow for improved image and video quality and more detailed analysis of the input data. The use of a separate source/tile thread for each input imaging device channel may further enhance the system's efficiency and accuracy by allowing for parallel processing of multiple input channels.

[0040] In some cases, a montage thread may be triggered by a Video Sync Timer to loop through the montage frame buffer, take video frames from each channel, and perform image and/or video composition or compilation (in some cases with background text/video blending), before sending the output frames to an application or a device. In some cases, the system may utilize a buffering scheme. The video frame buffer and the montage frame buffer may use a simple buffering scheme to reduce the queuing and processing delay.

[0041] In some cases, a kernel-level timer may be used as a Video Sync signal to trigger the montage thread to generate an output of at least about 30 frames per second. In some case, a kernel-level camera-input interrupt can be used as a camera call-back.

[0042] The end-to-end processing and buffering delay across the pipeline may be within the range of 2 to 3 video frames (60ms to 100ms). The maximum video frame delay may be less than 4 video frames (132ms).

[0043] In some embodiments, the system may utilize parallel processing. In some case, multiple threads may each process separate regions of a video frame in parallel, such as a left frame and right frame. Alternatively, image processing may occur once for every N image, where N is an integer greater than 1. For example, text detection may occur once every 3 frames or 6 frames.

[0044] In some embodiments, one or more frames of the plurality of tiles are freezable to allow a remote specialist to draw on the one or more frames and/or to indicate a region, object, or feature of interest within the one or more frames. In some embodiments, different tiles of the plurality of tiles are associated with different timelines of a same input source. In some embodiments, an image quality of an image or a video associated with each tile of the plurality of tiles may be controllable or adjustable by a user on an individual tile basis. In some embodiments, one or more image quality improvements may be applied per tile. In some embodiments, one or more image quality improvements may be applied to the entire montage. In some embodiments, the one or more tiles of the plurality of tiles may be adjustable or customizable. For example, a size of the one or more tiles may be adjustable or customizable. In another example, a position or an orientation of the one or more tiles may be adjustable or customizable. In some embodiments, the one or more tiles may be adjustable or customizable based on a user input or a control command provided by the user.

[0045] Montage

[0046] In some cases, the system may be configured to generate an image montage. In an operation room, doctors use various modalities to capture images, such as CT-SCAN, MRI, Cameras, ECG, etc. All of the images from different sources can be displayed on a single monitor by resizing or cropping the images on a suitable canvas. In some cases, the image montage may comprise a composition of photographs or videos that is generated by rearranging and/or overlapping two or more photographs or videos.

[0047] In some cases, the layouts of the image montage may be predefined and/or predetermined. In some cases, the layouts may be customized and/or user-defined, as described elsewhere herein. Each window in the layout may be referred to herein as a tile. In some embodiments, a plurality of tiles of the image or video montage may be arranged in an M x N array. In some embodiments, each tile of the plurality of tiles may be associated with a same input source. In some embodiments, each tile of the plurality of tiles may be associated with a different input source. For example, each tile of the plurality of tiles may be associated with a medical imaging or camera input. In some embodiments, the image or video montage may be configured to allow one or more remote participants or specialists to freeze one or more individual tiles and to draw on or telestrate an image or video associated with the individual tiles to indicate a region, object, or feature of interest. In some embodiments, the image or video montage may be viewable by one or more users in a video conference or session. For example, the video conference or session may permit one or more users to pick and/or choose one or more tiles of the plurality of tiles to view in the image or video montage. In some embodiments, each of the users may have a different set or subset of tiles selected for viewing. In some embodiment, a moderator of the video conference or session may control which tiles are visible to each of the one or more users. For example, different sets or subsets of tiles may be displayed to the users based on a specialty or an expertise of the users. In another example, different sets or subsets of tiles may be displayed to the users for security or privacy purposes. In some embodiments, different sets or subsets of tiles may be displayed to the users to preserve bandwidth for the video conference or session. In some embodiments, the video conference or session may permit each user to create one or more montages with different tiles and to share and/or stream the one or more montages simultaneously. In some embodiments, the video conference or session may permit each user to create a customized montage by picking or selecting tiles from the one or more montages which are created by different users and shared and/or streamed simultaneously. In some embodiments, each user may be permitted to pick one or more tiles from the montage and create a localized tile for streaming or sharing with one or more other users in the video conference or session. In some embodiments, the localized tile may be capable of being marked, telestrated, annotated, or otherwise manipulated by the one or more users to indicate or identify an issue or a feature of interest.

[0048] In some embodiments, text, graphics, or other data can be added or overlay ed on the tile images or videos. In some cases, the text, graphics, or other data may be updated in real time based on newly obtained data (e.g., image data, video data, sensor data, etc.).

[0049] Image Processing

[0050] The computer vision system may comprise an image processing module. The image processing module may be configured to perform auto cropping, generate image or video collages or montages, resize images, and/or produce image tiles (e.g., regular or irregular tiles). In some embodiments, the image processing unit may be configured to apply image quality improvements to the one or more images or videos to compensate for or to address one or more undesirable lighting conditions. For example, the image quality improvements may comprise HDR, SDR, and/or WDR processing. In some embodiments, the one or more undesirable lighting conditions may comprise image washout or imaging bleaching effects. In some embodiments, the image quality improvements may be applied individually per tile. In some embodiments, the image quality improvements may be applied to the entire montage.

[0051] Auto Cropping

[0052] The image processing module may be configured to perform autocropping of images and/or videos. Image cropping can be automated to automatically detect and provide a tight bounding box comprising one or more visually salient features, while removing black regions that provide no useful information, consume display real estate, and use up bandwidth while an image is being transferred over a network.

[0053] The autocropping can be used to remove regions in an image or video that do not comprise an object or feature of interest. Autocropping may also be used to remove small objects or outliers from an image or video tile. Autocropping can be used to focus on and maximize a view of one or more predetermined regions of interest.

[0054] FIGs. 2A and 2B illustrate a method for auto cropping to remove black regions around an image or a video. In some cases, a sweeping line method may be used to scan the image or video from four directions: (1) from top to bottom, (2) from bottom to top, (3) from left to right, and (4) from right to left. The sweeping starts from the borders of the images and proceeds to the interior of the domain. The sweeping stops when the line encounters a non-zero pixel value. The auto crop function may involve edge detection (e.g., detecting when a pixel values changes from a zero value to a non-zero value). There may not be a limitation to the input image size, and the background need not be pitch black. Outlier pixels can be tolerated. In some cases, text in an image can be detected, and auto crop can be used to remove text from the image to maximize the amount of display real estate used to visualize a feature, object, or region of interest.

[0055] In some cases, auto crop may be performed automatically as images are obtained or received. In other cases, the auto crop detection frequency may be set or adjusted by a user. [0056] FIG. 3A illustrates a medical image that may be obtained using an imaging device. The image may have black regions surrounding the region of interest. FIG. 3B illustrates an auto crop image. The auto cropped image may provide an enlarged or magnified view of the region of interest without excessive black borders surrounding the region of interest.

[0057] Image Collage / Montage

[0058] In some cases, the image processing module may be configured to generate an image or video collage or montage. The image or video collage or montage may comprise a combination, an assembly, a compilation, or an overlay of different images or videos (e.g., images or videos from different perspectives or from different image or video capture devices) to provide a more holistic visualization or multiple simultaneous visualizations of multiple scenes or views of a surgical procedure or a surgical site.

[0059] Tiling

[0060] The image processing module may be configured to perform tiling. The tiling may comprise arranging one or more images or videos in a desired format or layout. The image processing module may generate or compose a large number of tiles. In some cases, the image processing module may generate customizable tiles. A user may interact with the tiles (e.g., look at position operations or zoom operations). In some cases, the image processing module may be configured to adjust the gaps between tiles. In some cases, the image processing module may be configured to resize individual tiles or adjust the position and/or orientation of one tile relative to another tile.

[0061] In some embodiments, the image or video source for the tiles may be selected or configured. In some embodiments, the geometry, layout, or attributes of the tiles may be configured. In some cases, the tiles may be swapped. In some cases, the background color of the tiles may be adjusted. Fonts and/or border widths may also be customized for each tile. The resolution of the images or videos displayed in each tile may be adjusted to preserve or fully utilize network bandwidth.

[0062] FIG. 4 and FIG. 5 illustrate various examples of layouts that can be used to present or display the image or video collage or montage to a user. In some cases, the layouts may comprise multiple tiles arranged next to each other. In some cases, the tiles may be of a same size or shape (e.g., to split the viewing real estate equally). In some cases, two or more tiles may have different sizes or shapes (e.g., to prioritize the viewing real estate for one particular tile).

[0063] Resizing / scaling

[0064] In some cases, the image processing module may be configured to resize images and/or videos obtained from an external input or source. In some cases, such resizing may be performed to facilitate zooming or autocropping of the images, as described elsewhere herein. [0065] In some cases, the images may be scaled. The scaling may involve classical scaling (e.g., linear, area, nearest neighbors, Lanczos, FFT, etc.) or super resolution techniques (e g., EDSR, ESPCN, FSRCNN, LapSRC, etc.).

[0066] In some embodiments, the scaling factor may be generated dynamically based on an input resolution of the images or videos. In some embodiments, the scaling factor may be utilized to enable a virtualized pan, tilt, and/or zoom function. In some embodiments, the scaling factor may be customizable by a user input and/or controllable via a cloud server or a network. In some embodiments, the virtualized pan, tilt, and/or zoom may be controllable using one or more virtualized pan, tilt, or zoom controls applicable to a two-dimensional (2D), three- dimensional (3D), or four-dimensional (4D) input.

[0067] Multiple Modes

[0068] In some cases, a user may select between multiple viewing modes such as, for example, an original view or a full view, a cropped view, a pan view, or a zoom view. The user may switch between the various viewing modes as desired, as shown in FIG. 6.

[0069] The original view or full view may correspond to the original image obtained using an imaging device. The cropped view may provide a view of a region of interest in the image after auto cropping is performed. The pan view may comprise a view of a region of interest in the image after a panning operation is performed. The zoom view may comprise a view of a region of interest in the image after a zooming operation is performed.

[0070] In some cases, the user may select a zoom pan mode. The zoom pan mode may allow or permit zooming and panning to adjust a field of view and/or a region of interest. The region of interest may comprise a portion of an image comprising an object, feature, or structure of interest. Zooming may comprise zooming in or out and adjusting the field of view and/or resizing the field of view accordingly. Panning may comprise adjusting the field of view to capture different portions or regions of an image. The field of view may be adjusted in size or shape The field of view may be adjusted to provide a view of different portions or regions of an image. FIG. 7 shows a zoom in and a zoom out operation that can be performed when the user is operating the system with the zoom mode selected. FIG. 8 shows a panning operation that can be performed when the user is operating the system with the pan mode selected. The panning operation may comprise moving the field of view or target image region of interest up, down, left, right, or any combination thereof.

[0071] In some cases, the system may create real-time image montages or video compositions from different media sources, such as, for example, video or image databases, imaging devices, image or video outputs, etc. The system may allow users to perform zooming or panning within the layout of the images or videos of the montage, to better view the details of an object of interest.

[0072] As shown in FIG. 9, in some cases, the system may generate an image bounding box based on a current zoom and shift factor. The zoom and pan within the tile may be resized or shifted or scaled based on user input, and a region of interest within an intersection of the bounding box and a tile bounding box may be located. In some embodiments, the system provided herein may be configured to allow a remote user to select one or more regions of interest for the medical imaging or camera input by manually drawing a bounding box. The portion of the image associated with the intersected area may be displayed to a user. This approach can remove the need for unnecessary cropping during pan and zoom, and allow the user to set a center of the image. As shown in FIG. 10, the relative position or orientation of the image compared to a field of view of a tile may be adjusted or shifted to effect a change of coordinates. The intersection of the field of view of the tile and the original image may define a region of interest.

[0073] View Position

[0074] In some cases, a user may set or adjust a view position for a particular image. In some cases, the user may reset the view position as desired (e.g., to reorient the user with a default view).

[0075] Review

[0076] In some cases, the system may record the images or videos displayed in tiles that are arranged in a predetermined or user-selected layout. The recorded images or videos in the tile arrangement may be stored for future review or reference. In some cases, the recorded images or videos in the tile arrangement may be provided to a data storage unit or a cloud storage unit [0077] PHI/Anonymization/Face detection

[0078] In some cases, the images or videos obtained may contain personal health information (PHI) The systems disclosed herein may be configured to anonymize the images or videos by removing or redacting the PHI. PHI may include demographic information, birth date, name, address, phone numbers, fax numbers, email addresses, social security numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate or license numbers, vehicle identifiers or license plates, device identifiers or serial numbers, IP addresses, biometric IDs such as fingerprint or voice print, photos showing identifying characteristics, medical histories, test and laboratory results, physical or mental health conditions, insurance information and other data that can be used to identify an individual or determine a health condition of the individual.

[0079] In some cases, the system may be used by a company representative to participate in video conferencing or streaming with a surgeon and principal individual near the surgical table. The video call can be initiated from a console to remote user and the remote specialist can control what is being broadcasted from the console. In some cases, this can involve at least two cameras (top and front) and a plurality of external inputs/medical imaging devices which can be connected to the console. In order to be HIPPA compliant, the system can remove PHI related information from the streamed / stored video content. In some cases, audio PHI may also be removed or redacted using audio processing. In some cases, PHI may be detected and redacted or removed in part based on text detection and recognition and/or face detection. In some cases, PHI may be detected and redacted or removed in part based on audio detection of certain key words correlated with or related to PHI.

[0080] PHI detection and recognition can be a very CPU intensive operation. For this reason, it can be performed in two parts using edge computing (on a console) and cloud computing (on the cloud). In some cases, edge computing may be based on best effort basis and may remove at least a portion of the PHI information. PHI can also be removed from external modalities which are the main source of PHI. PHI Information in external modalities is generally in static or predefined locations. In some cases, a single frame can be processed, and processing may not or need not occur for N number of frames until the scene changes or a modality change occurs. This can decrease CPU usage significantly. In some cases, cloud computing may be utilized for advanced processing and verification of recorded content. In the verification step, if a segment or frame contains any face or handwritten document or a document which gets captured in the camera, that segment or frame can be individually processed to remove or makes any PHI related material. In addition, if there is any PHI information sent out due to best effort basis, any remaining PHI information that is only partially masked or removed may be corrected as well using cloud computing techniques.

[0081] There are several advantages of processing PHI in stages, including real time processing, enhanced video quality, and lower processing power / cost. PHI related information can be masked or removed before sending out an image or a video to a viewer. PHI detection can also happen before the stream is encoded, and the video quality would not be impacted. The processing power may also be distributed, which can yield cost saving. In some cases, only the segments which contain PHI may be re encoded in cloud computing.

[0082] Advanced Processing

[0083] In some cases, the system may perform advanced processing, such as tool detection, objection detection, or clinical insight generation or visualization.

[0084] Camera Parameters

[0085] The images or videos used to generate an image or video collage or montage may be captured using one or more imaging devices and one or more camera parameters. In some cases, the camera parameters may be adjusted for the images or videos displayed for a particular tile (e.g., based on user preference, or based on automatic optimizations by a machine learning or artificial intelligence system). The parameters may include, for example, white balance, exposure control, frames per second, gain, saturation, contrast, brightness, backlight levels, hue, gamma, dynamic range, shutter speed, aperture, sharpness, color balance, depth of field, ISO, etc.

[0086] Method

[0087] FIG. 11 illustrates an exemplary method for computer vision. The method may comprise using camera hardware to obtain one or more frames. The one or more frames may be provided to a frame buffer associated with an image source. The image source may correspond to the camera hardware used to obtain the one or more frames. The one or more frames may be read, processed, and associated with one or more image tiles. In some embodiments, the one or more image tiles may be compiled or arranged to produce an image montage that can be transferred or output to an application or a display unit.

[0088] Machine Learning

[0089] In any of the embodiments described herein, machine learning may be used to train the image processing algorithms of the present disclosure. In some cases, one or more data sets may be provided to a machine learning module. The machine learning module may be configured to generate machine learning data based on the data sets. The one or more data sets may be used as training data sets for one or more machine learning algorithms. Learning data may be generated based on the data sets. In some embodiments, supervised learning algorithms may be used. Optionally, unsupervised learning techniques and/or semi-supervised learning techniques may be utilized in order to generate learning data. The learning data may be used to train the machine learning module and/or the machine learning algorithms. In some cases, data may be fed back into the learning data sets to improve the machine learning algorithms.

[0090] In some embodiments, the machine learning module may utilize one or more neural networks. The one or more neural networks may comprise, for example, a deep convolution neural network. The machine learning may utilize any type of convolutional neural network (CNN). Shift invariant or space invariant neural networks (SIANN) may also be utilized. Image classification, object detection, and/or object localization may also be utilized. In some embodiments, the neural network may comprise a convolutional neural network (CNN). The CNN may be, for example, U-Net, ImageNet, LeNet-5, AlexNet, ZFNet, GoogleNet, VGGNet, ResNetl8, or ResNet, etc. In some cases, the neural network may be, for example, a deep feed forward neural network, a recurrent neural network (RNN), LSTM (Long Short Term Memory), GRU (Gated Recurrent Unit), Auto Encoder, variational autoencoder, adversarial autoencoder, denoising auto encoder, sparse auto encoder, Boltzmann machine, RBM (Restricted BM), deep belief network, generative adversarial network (GAN), deep residual network, capsule network, attention/transformer networks, etc. In some embodiments, the neural network may comprise one or more neural network layers. The neural network may have at least about 2 to 1000 or more neural network layers. In some cases, the machine learning algorithm may implement, for example, a random forest, a boosted decision tree, a classification tree, a regression tree, a bagging tree, a neural network, or a rotation forest.

[0091] Computer Systems

[0092] In an aspect, the present disclosure provides computer systems that are programmed or otherwise configured to implement methods of the disclosure. FIG. 12 shows a computer system 1201 that is programmed or otherwise configured to implement a method for computer vision. The computer system 1201 may be configured to, for example, process one or more images or videos to generate an image or video montage comprising a plurality of tiles, each tile comprising a view of a portion of a target scene of interest. The computer system 1201 can be an electronic device of a user or a computer system that is remotely located with respect to the electronic device. The electronic device can be a mobile electronic device.

[0093] The computer system 1201 may include a central processing unit (CPU, also "processor" and "computer processor" herein) 1205, which can be a single core or multi core processor, or a plurality of processors for parallel processing. The computer system 1201 also includes memory or memory location 1210 (e g , random-access memory, read-only memory, flash memory), electronic storage unit 1215 (e.g., hard disk), communication interface 1220 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 1225, such as cache, other memory, data storage and/or electronic display adapters. The memory 1210, storage unit 1215, interface 1220 and peripheral devices 1225 are in communication with the CPU 1205 through a communication bus (solid lines), such as a motherboard. The storage unit 1215 can be a data storage unit (or data repository) for storing data. The computer system 1201 can be operatively coupled to a computer network ("network") 1230 with the aid of the communication interface 1220. The network 1230 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet. The network 1230 in some cases is a telecommunication and/or data network. The network 1230 can include one or more computer servers, which can enable distributed computing, such as cloud computing. The network 1230, in some cases with the aid of the computer system 1201, can implement a peer-to- peer network, which may enable devices coupled to the computer system 1201 to behave as a client or a server.

[0094] The CPU 1205 can execute a sequence of machine-readable instructions, which can be embodied in a program or software. The instructions may be stored in a memory location, such as the memory 1210. The instructions can be directed to the CPU 1205, which can subsequently program or otherwise configure the CPU 1205 to implement methods of the present disclosure. Examples of operations performed by the CPU 1205 can include fetch, decode, execute, and writeback.

[0095] The CPU 1205 can be part of a circuit, such as an integrated circuit. One or more other components of the system 1201 can be included in the circuit. In some cases, the circuit is an application specific integrated circuit (ASIC).

[0096] The storage unit 1215 can store files, such as drivers, libraries and saved programs. The storage unit 1215 can store user data, e.g., user preferences and user programs. The computer system 1201 in some cases can include one or more additional data storage units that are located external to the computer system 1201 (e.g., on a remote server that is in communication with the computer system 1201 through an intranet or the Internet).

[0097] The computer system 1201 can communicate with one or more remote computer systems through the network 1230. For instance, the computer system 1201 can communicate with a remote computer system of a user (e.g., a medical operator, a medical assistant, or a remote viewer monitoring the medical operation). Examples of remote computer systems include personal computers (e.g., portable PC), slate or tablet PC's (e.g., Apple® iPad, Samsung® Gala6 Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants. The user can access the computer system 1201 via the network 1230. [0098] Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 1201, such as, for example, on the memory 1210 or electronic storage unit 1215. The machine executable or machine readable code can be provided in the form of software. During use, the code can be executed by the processor 1205. In some cases, the code can be retrieved from the storage unit 1215 and stored on the memory 1210 for ready access by the processor 1205. In some situations, the electronic storage unit 1215 can be precluded, and machine-executable instructions are stored on memory 1210.

[0099] The code can be pre-compiled and configured for use with a machine having a processor adapted to execute the code, or can be compiled during runtime. The code can be supplied in a programming language that can be selected to enable the code to execute in a precompiled or as-compiled fashion.

[00100] Aspects of the systems and methods provided herein, such as the computer system 1201, can be embodied in programming. Various aspects of the technology may be thought of as "products" or "articles of manufacture" typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk. "Storage" type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software. As used herein, unless restricted to non-transitory, tangible "storage" media, terms such as computer or machine "readable medium" refer to any medium that participates in providing instructions to a processor for execution.

[00101] Hence, a machine readable medium, such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media including, for example, optical or magnetic disks, or any storage devices in any computer(s) or the like, may be used to implement the databases, etc. shown in the drawings. Volatile storage media include dynamic memory, such as main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.

[00102] The computer system 1201 can include or be in communication with an electronic display 1235 that comprises a user interface (UI) 1240 for providing, for example, a portal for viewing an image or video montage comprising one or more image or video tiles. The portal may be provided through an application programming interface (API). A user or entity can also interact with various elements in the portal via the UI. Examples of UI's include, without limitation, a graphical user interface (GUI) and web-based user interface.

[00103] Methods and systems of the present disclosure can be implemented by way of one or more algorithms. An algorithm can be implemented by way of software upon execution by the central processing unit 1205. For example, the algorithm may be configured to process one or more images or videos to generate an image or video montage comprising a plurality of tiles, each tile comprising a view of a portion of a target scene of interest.

[00104] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited by the specific examples provided within the specification. While the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be constmed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it shall be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims

CLAIMS WHAT IS CLAIMED IS:

1. A system, comprising: one or more imaging devices; and an image processing module operatively coupled to the one or more imaging devices, wherein the image processing module is configured to process one or more images or videos obtained using the one or more imaging devices to generate an image or video montage comprising a plurality of tiles.

2. The system of claim 1, wherein each tile of the plurality of tiles comprises a view of a portion of a target scene of interest.

3. The system of claim 1, wherein the plurality of tiles is arranged in a predetermined or user-defined layout or configuration.

4. The system of claim 1, wherein the image processing module is configured to perform auto cropping, zooming, and/or panning of the one or more images or videos.

5. The system of claim 1, wherein the image processing module is configured to automatically crop the one or more images or videos.

6. The system of claim 5, wherein the image processing module is configured to automatically crop the one or more images or videos by generating a bounding box around one or more visually salient features in the one or more images or videos.

7. The system of claim 5, wherein the image processing module is configured to automatically crop the one or more images or videos by removing one or more undesirable regions or portions of the one or more images or videos.

8. The system of claim 7, wherein the one or more undesirable regions or portions comprise a black region that does not provide or comprise data or information about a subject or a target scene or object of interest.

9. The system of claim 5, wherein the image processing module is configured to automatically crop the one or more images or videos to focus on or maximize a view of one or more regions or features of interest.

10. The system of claim 5, wherein the image processing module is configured to automatically crop the one or more images or videos by utilizing one or more sweeping lines.

11. The system of claim 10, wherein the image processing module is configured to use the one or more sweeping lines to (i) scan the one or more images or videos along one or more directions and (ii) identify or detect a change in a pixel value as the one or more sweeping lines scan along the one or more directions. The system of claim 11, wherein the change in the pixel value corresponds to a change from a first pixel value to a second pixel value. The system of claim 12, wherein the first pixel value is zero. The system of claim 12, wherein the second pixel value is non-zero. The system of claim 13, wherein the first pixel value corresponds to a black region of the one or more images or videos. The system of claim 5, wherein the image processing module is configured to automatically crop the one or more images or videos using edge detection. The system of claim 16, wherein the edge detection involves detecting when a pixel value changes from a first pixel value to a second pixel value. The system of claim 5, wherein the image processing module is configured to automatically crop the one or more images or videos as the one or more images or videos are obtained, captured, or received. The system of claim 5, wherein the image processing module is configured to crop the one or more images or videos at a predetermined time or frequency. The system of claim 19, wherein the predetermined time or frequency is set by a user. The system of claim 5, wherein the image processing module is configured to generate an auto cropped image or video, wherein the auto cropped image or video provides an enlarged or magnified view of a region of interest that does not include excessive black borders surrounding the region of interest. The system of claim 1, wherein the plurality of tiles of the image or video montage is arranged in an M x N array. The system of claim 22, wherein each tile of the plurality of tiles is associated with a medical imaging or camera input. The system of claim 22, wherein each tile of the plurality of tiles is associated with a different input source. The system of claim 22, wherein each tile of the plurality of tiles is associated with a same input source. The system of claim 23, wherein the image or video montage is configurable to allow a remote user to select one or more regions of interest for the medical imaging or camera input by manually drawing a bounding box. The system of claim 23, wherein the image processing unit is configured to automatically detect one or more regions of interest for the medical imaging or camera input. The system of claim 1, wherein the image or video montage is configurable to allow one or more remote participants or specialists to freeze one or more individual tiles and to draw on or telestrate an image or video associated with the individual tiles to indicate a region, object, or feature of interest. The system of claim 1, wherein each tile of the plurality of tiles is from a same source. The system of claim 1, wherein each tile of the plurality of tiles is from a different source. The system of claim 1, wherein one or more frames of the plurality of tiles are freezable to allow a remote specialist to draw on the one or more frames and/or to indicate a region, object, or feature of interest within the one or more frames. The system of claim 31, wherein different tiles of the plurality of tiles are associated with different timelines of a same input source. The system of claim 1, wherein the image processing unit is configured to perform or apply pan, tilt, and/or zoom on or to the one or more imaging devices, wherein the one or more imaging devices comprise a medical imaging device or a camera. The system of claim 33, wherein the pan, tilt, and/or zoom is enabled or controlled using hardware and/or one or more physical components. The system of claim 33, wherein the pan, tilt, and/or zoom is virtualized and/or enabled or controlled using software. The system of claim 35, wherein the virtualized pan, tilt, and/or zoom is enabled using a scaling factor that is generated dynamically based on an input resolution of the images or videos. The system of claim 36, wherein the scaling factor is customizable by a user input and/or controllable via a cloud server or a network. The system of claim 36, wherein the virtualized pan, tilt, and/or zoom is controllable using one or more virtualized pan, tilt, or zoom controls applicable to a two-dimensional (2D), three-dimensional (3D), or four-dimensional (4D) input. The system of claim 1, wherein the image processing unit is configured to implement image interpolation or image enhancement techniques to improve or enhance an image quality of the one or more images or videos. The system of claim 1, wherein an image quality of an image or a video associated with each tile of the plurality of tiles is controllable or adjustable by a user on an individual tile basis. The system of claim 40, wherein one or more image quality improvements are applied per tile. The system of claim 40, wherein one or more image quality improvements are applied to the entire montage. The system of claim 1, wherein the image processing unit is configured to apply image quality improvements to the one or more images or videos to compensate for or to address one or more undesirable lighting conditions. The system of claim 43, wherein the image quality improvements comprise HDR, SDR, and/or WDR processing The system of claim 43, wherein the one or more undesirable lighting conditions comprise image washout or imaging bleaching effects. The system of claim 43, wherein the image quality improvements are applied individually per tile. The system of claim 43, wherein the image quality improvements are applied to the entire montage. The system of claim 1, wherein one or more tiles of the plurality of tiles are adjustable or customizable. The system of claim 48, wherein a size of the one or more tiles is adjustable or customizable. The system of claim 48, wherein a position or an orientation of the one or more tiles is adjustable or customizable. The system of claim 48, wherein the one or more tiles are adjustable or customizable based on a user input or a control command provided by the user. The system of claim 1, wherein the image or video montage is viewable by one or more users in a video conference or session. The system of claim 52, wherein the video conference or session permits the one or more users to pick and/or choose one or more tiles of the plurality of tiles to view in the image or video montage. The system of claim 53, wherein each of the users has a different set or subset of tiles selected for viewing. The system of claim 52, wherein a moderator of the video conference or session controls which tiles are visible to each of the one or more users. The system of claim 55, wherein different sets or subsets of tiles are displayed to the users based on a specialty or an expertise of the users. The system of claim 55, wherein different sets or subsets of tiles are displayed to the users for security or privacy purposes. The system of claim 55, wherein different sets or subsets of tiles are displayed to the users to preserve bandwidth for the video conference or session. The system of claim 52, wherein the video conference or session permits each user to create one or more montages with different tiles and to share and/or stream the one or more montages simultaneously. The system of claim 52, wherein the video conference or session permits each user to create a customized montage by picking or selecting tiles from the one or more montages which are created by different users and shared and/or streamed simultaneously. The system of claim 52, wherein each user is permitted to pick one or more tiles from the montage and create a localized tile for streaming or sharing with one or more other users in the video conference or session. The system of claim 61, wherein the localized tile is capable of being marked, telestrated, annotated, or otherwise manipulated by the one or more users to indicate or identify an issue or a feature of interest.