CN116508070A

CN116508070A - Visibility metrics in multi-view medical activity recognition systems and methods

Info

Publication number: CN116508070A
Application number: CN202180076376.9A
Authority: CN
Inventors: O·莫哈雷里; A·T·施密特; A·莎吉卡尔甘罗迪
Original assignee: Intuitive Surgical Operations Inc
Current assignee: Intuitive Surgical Operations Inc
Priority date: 2020-11-13
Filing date: 2021-11-12
Publication date: 2023-07-28
Also published as: CN116472565A

Abstract

Visibility metrics in multi-view medical activity recognition systems and methods are described herein. In some illustrative examples, a system accesses imagery of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the imagery including a first imagery captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints. The system determines an activity visibility metric value for the first sensor during the medical session and based on the first image. The system facilitates adjusting a first viewpoint of a first sensor based on an activity visibility metric.

Description

Visibility metrics in multi-view medical activity recognition systems and methods

RELATED APPLICATIONS

The present application claims priority from U.S. provisional patent application Ser. No. 63/141,830, filed on day 26 of 1 month 2021, U.S. provisional patent application Ser. No. 63/141,853, filed on day 26 of 1 month 2021, and U.S. provisional patent application Ser. No. 63/113,685, filed on day 11 and 13 of 2020, the contents of which are incorporated herein by reference in their entireties.

Background

Computer-implemented activity recognition typically involves capturing and processing images (images) of a scene to determine characteristics of the scene. Conventional activity recognition may lack a desired level of accuracy and/or reliability for dynamic and/or complex environments. For example, some objects in dynamic and complex environments (such as those associated with surgical procedures) may be occluded from view of the imaging device.

Disclosure of Invention

The following description presents a simplified summary of one or more aspects of the systems and methods described herein. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present one or more aspects of the systems and methods described herein as a prelude to the more detailed description that is presented later.

The illustrative system includes a memory storing instructions and a processor communicatively coupled to the memory and configured to execute the instructions to access an image of a scene of a medical session (session) captured by a plurality of sensors from a plurality of viewpoints, the image including a first image captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints; determining an activity visibility metric value of the first sensor during the medical session and based on the first image; and facilitating adjustment of the first viewpoint of the first sensor based on the activity visibility metric.

An illustrative method includes: accessing, by the processor, an image of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the image including a first image captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints; determining, by the processor, an activity visibility metric value of the first sensor during the medical session and based on the first image; and facilitating, by the processor, adjustment of the first viewpoint of the first sensor based on the activity visibility metric.

An illustrative non-transitory computer readable medium stores instructions executable by a processor to access an image of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the image including a first image captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints; determining an activity visibility metric value of the first sensor during the medical session and based on the first image; and facilitating adjustment of the first viewpoint of the first sensor based on the activity visibility metric value.

An illustrative system includes a memory storing instructions and a processor communicatively coupled to the memory and configured to execute the instructions to access an image of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the image including a first image captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints; determining a first classification of an activity of the scene based on the first imagery; determining an activity visibility metric value of the first sensor during the medical session and based on the first image; determining that the activity visibility metric value of the first sensor is below an activity visibility metric threshold; and based on determining that the activity visibility metric value of the first sensor is below the activity visibility metric threshold, reducing a weight of the first classification of activity of the scene to determine an overall classification of activity of the scene based on the imagery of the scene.

An illustrative method includes accessing, by a processor, an image of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the image including a first image captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints; determining, by the processor, a first classification of activity of the scene based on the first image; determining, by the processor, a value of an activity visibility measure of the first sensor during the medical session and based on the first image; determining, by the processor, that the activity visibility metric value of the first sensor is below an activity visibility metric threshold; and based on determining that the activity visibility metric value of the first sensor is below the activity visibility metric threshold, reducing, by the processor, a weight of the first classification of the imagery of the scene to determine an overall classification of the activity of the scene based on the imagery of the scene.

An illustrative non-transitory computer readable medium storing instructions executable by a processor to access an image of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the image comprising a first image captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints; determining a first classification of an activity of the scene based on the first imagery; determining an activity visibility metric value of the first sensor during the medical session and based on the first image; determining that the activity visibility metric value of the first sensor is below an activity visibility metric threshold; and based on determining that the activity visibility metric value of the first sensor is below the activity visibility metric threshold, reducing a weight of the first classification of activity of the scene to determine an overall classification of activity of the scene based on the imagery of the scene.

Drawings

The accompanying drawings illustrate various embodiments and are a part of the specification. The illustrated embodiments are merely examples and do not limit the scope of the disclosure. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements.

Fig. 1 depicts an illustrative multi-view medical activity recognition system in accordance with principles described herein.

FIG. 2 depicts an illustrative processing system in accordance with the principles described herein.

Fig. 3-4 depict an illustrative multi-view medical activity recognition system in accordance with the principles described herein.

Fig. 5 depicts an illustrative computer-aided robotic surgical system in accordance with principles described herein.

Fig. 6 depicts an illustrative configuration of an imaging device attached to a robotic surgical system in accordance with principles described herein.

Fig. 7-8 depict illustrative methods in accordance with the principles described herein.

FIG. 9 depicts an illustrative computing device in accordance with the principles described herein.

Detailed Description

Systems and methods for multi-view medical activity identification are described herein. The activity recognition system may include a plurality of sensors including at least two imaging devices configured to capture images of a scene from different viewpoints. The activity recognition system may determine one or more activity visibility metrics for each of the images captured by the imaging device, the one or more activity visibility metrics representing the visibility of the activity of the scene captured in the images. Based on the activity visibility metric, the activity recognition system may facilitate adjusting one or more viewpoints of one or more imaging devices, such as to capture additional imagery from additional viewpoints, the additional imagery having a higher activity visibility metric than the initial imagery. Additionally or alternatively, the activity recognition system may utilize the activity visibility metric value to determine a classification of the activity of the scene.

In some examples, the scene may be a medical session such as a surgical session (surgical session), and the activity may include phases of the surgical session (phases). During a medical session, images of a scene may be captured by multiple imaging devices. The activity visibility metric value may be determined based on both the content of the imagery and the activity within the medical session. When the activity visibility metric indicates that the imaging device has a sub-optimal view of the activity of the scene, the activity recognition system may provide an output configured to facilitate a change in pose of the imaging device to capture an image with a better activity view. Thus, the activity recognition system may dynamically adjust the configuration of the imaging device to capture imagery from a viewpoint that optimizes the visibility of the activity of the scene.

The systems and methods described herein may provide various advantages and benefits. For example, the systems and methods described herein may provide accurate, dynamic, and/or flexible activity recognition. The illustrative examples of activity recognition described herein may be more accurate and/or flexible than conventional activity recognition based on single sensor activity recognition or fixed multi-sensor activity recognition. Illustrative examples of the systems and methods described herein may be well suited for activity recognition of dynamic and/or complex scenarios, such as scenarios associated with medical sessions.

Various illustrative embodiments will now be described in more detail. The disclosed systems and methods may provide one or more of the above benefits and/or various additional and/or alternative benefits that will be apparent herein.

Fig. 1 depicts an illustrative multi-view medical activity recognition system 100 (system 100). As shown, the system 100 may include a plurality of sensors (e.g., imaging devices 102-1 and 102-2, collectively, "imaging devices 102") positioned relative to the scene 104, and the imaging devices 102 may be configured to image the scene 104 by capturing images of the scene 104 simultaneously.

Scene 104 may include any environment and/or element of an environment that may be imaged by imaging device 102, for example, scene 104 may include a tangible real-world scene (real-world scene) of physical elements. In some illustrative examples, scene 104 is associated with a medical session such as a surgical procedure. For example, the scene 104 may include a surgical scene at a surgical site (such as a surgical device, operating room, or the like). For example, the scene 104 may include all or a portion of an operating room in which a surgical procedure may be performed on a patient. In some implementations, the scene 104 includes an area proximate to an operating room of a robotic surgical system for performing a surgical procedure. In some implementations, the scene 104 includes an area within the patient. While certain illustrative examples described herein are directed to a scene 104 that includes a scene at a surgical device, one or more principles described herein may be applied to other suitable scenes in other implementations.

Imaging device 102 may include any imaging device configured to capture an image of scene 104. For example, imaging device 102 may include a video imaging device, an infrared imaging device, a visible light imaging device, a non-visible light imaging device, an intensity imaging device (e.g., color, grayscale, black and white imaging devices), a depth imaging device (e.g., stereoscopic imaging device, time-of-flight imaging device, infrared imaging device, etc.), an endoscopic imaging device, any other imaging device, or any combination or sub-combination of such imaging devices. The imaging device 102 may be configured to capture images of the scene 104 at any suitable capture rate. The imaging device 102 may be synchronized in any suitable manner for synchronizing the capturing of images of the scene 104. Synchronization may include the operation of the imaging devices being synchronized and/or the data sets output by the imaging devices being synchronized by matching the data sets to a common point in time.

Fig. 1 shows a simple configuration of two imaging devices 102 positioned to capture images of a scene 104 from two different viewpoints. This configuration is exemplary. It will be appreciated that a multi-sensor architecture, such as a multi-view architecture, may include two or more imaging devices positioned to capture images of the scene 104 from two or more different viewpoints. The viewpoint of the imaging device 102 (i.e., the position, the fetch, and the view settings, such as the zoom of the imaging device 102) determines the content of the image captured by the imaging device 102. The multi-sensor architecture may also include additional sensors positioned to capture data of the scene 104 from additional locations.

The system 100 may include a processing system 106 communicatively coupled to the imaging device 102. The processing system 106 may be configured to access the imagery captured by the imaging device 102 and determine a value of an activity visibility metric of the imaging device 102, as further described herein. The processing system 106 may utilize the value of the activity visibility metric to facilitate adjustment of the viewpoint of the imaging device 102 and/or to determine activity of the scene of the medical session (e.g., activity recognition). Such applications for activity visibility metrics are further described herein.

Fig. 2 illustrates an example configuration of a processing system 106 of a multi-view medical activity recognition system (e.g., system 100). The processing system 106 may include, but is not limited to, a storage device 202 and a processing device 204 that are selectively and communicatively coupled to each other. The devices 202 and 204 may each include or be implemented by one or more physical computing devices including hardware and/or software components, such as processors, memory, storage drives, communication interfaces, instructions stored in memory for execution by the processors, and the like. Although devices 202 and 204 are shown as separate devices in fig. 2, devices 202 and 204 may be combined into fewer devices, such as into a single device, or divided into more devices that may serve a particular implementation. In some examples, each of the devices 202 and 204 may be distributed among multiple apparatuses and/or multiple orientations that may serve a particular implementation.

The storage device 202 may hold (e.g., store) executable data used by the processing device 204 to perform any of the functions described herein. For example, the storage device 202 may store instructions 206 that may be executed by the processing device 204 to perform one or more of the operations described herein. The instructions 206 may be implemented by any suitable application, software, code, and/or other executable data example. The storage device 202 may also hold any data received, generated, managed, utilized, and/or transmitted by the processing device 204.

The processing device 204 may be configured to perform (e.g., execute the instructions 206 stored in the storage device 202 to perform) various operations associated with activity recognition, such as activity recognition of a scene of a medical session performed by a computer-assisted surgical system.

These and other exemplary operations that may be performed by the processing system 106 (e.g., by the processing device 204 of the processing system 106) are described herein. In the following description, any reference to functions performed by the processing system 106 may be understood as being performed by the processing device 204 based on the instructions 206 stored in the storage device 202.

Fig. 3 shows an example configuration of the processing system 106. As shown, the processing system 106 includes an activity visibility module 302 (e.g., activity visibility module 302-1 and activity visibility module 302-2). The activity visibility module 302 may be configured to access an image 304 (e.g., image 304-1 and image 304-2) captured by an imaging device (e.g., imaging device 102) of an activity recognition system (e.g., system 100) and determine an activity visibility metric 306 (e.g., activity visibility metric 306-1 and activity visibility metric 306-2) based on the image 304. The processing system 106 also includes an activity classifier 308, which activity classifier 308 may generate an activity classification 310 based on the activity visibility metric value 306 and/or provide an output to facilitate one or more view point adjustments 312 of the imaging device 102.

For example, the activity visibility module 302-1 may receive the image 304-1 from the imaging device 102-1. The imagery 304-1 may include any image data representing a plurality of images or one or more aspects of images captured by an imaging device 102-1 of a scene (e.g., scene 104) (such as a scene of a medical session) and/or any image data representing a plurality of images or one or more aspects of images captured by an imaging device 102-1 of a scene (e.g., scene 104) (such as a scene of a medical session). For example, the plurality of images may be one or more video clips comprising a series of images captured over a period of time. The video clip may capture one or more activities performed in the scene 104.

An activity may be any action performed by a person or system in the scene 104. In some examples, the activity may be specific to an action performed in association with a medical session (such as a predefined phase of the medical session). For example, a particular surgical session may include 10-20 (or any other suitable number) of different predefined phases, such as sterile preparation, patient call-in (roll in), surgery, etc., which may be a defined active set from which the system 100 classifies the activity of the scene 104 as captured in a particular video clip.

The activity visibility module 302-1 can access the imagery 304-1 (e.g., one or more video clips) in any suitable manner. For example, the activity visibility module 302-1 may receive the image 304-1 from the imaging device 102-1, retrieve the image 304-1 from the imaging device 102-1, receive and/or retrieve the image 304-1 from a storage device and/or any other suitable device communicatively coupled to the imaging device 102-1, and the like.

The activity visibility module 302-1 may determine an activity visibility metric 306-1 based on the imagery 304-1. The activity visibility metric 306-1 may include a score or any other metric that represents a rating of the degree of visibility of the activity of the scene 104 in the image. For example, the activity visibility metric 306-1 may be a number between 1 and 5, where 5 represents the highest activity visibility and 1 represents the lowest activity visibility. The number may be implemented as an integer (i.e., the score may be one of 1, 2, 3, 4, or 5) or any suitable rational number of decimal places that may be rounded to one, two, or any suitable number of digits. Alternatively, activity visibility metrics may be implemented using any other such suitable range and/or scale.

The activity visibility metric 306-1 may be determined based on any suitable set of factors. In some examples, activity visibility metric 306-1 may be based on a general visibility of image 304-1 and/or a particular visibility of activity in image 304-1. The general visibility may correspond to the overall degree of visibility of any content of the image 304-1 in the image 304-1. For example, the general visibility may include factors such as distance from the scene 104, noise level in the image 304-1 captured by the imaging device 102-1, whether the image 304-1 is in focus, whether the image 304-1 is overexposed.

On the other hand, the particular visibility of the activity may be based on the degree of visibility of the activity of the scene 104 in the imagery 304-1, which may be separate from the general visibility. For example, two video clips may be similarly generally visible (e.g., similar distance from a scene 104 with similar content definition), but based on the activity of the scene 104, the particular visibility of the activity (and, as a result, the activity visibility metric value 306-1) may differ due to identifying important elements for an activity that is visible in one video clip but not in another video clip. Example factors that may affect the particular visibility of an activity may include whether an object is occluding the activity of the scene 104, whether the imagery 304-1 captures important elements of the activity (e.g., objects of interest to the activity), and so forth. The particular visibility of the activity may be additionally affected by the general visibility of the image 304-1. For example, because all of the content (including the activity) of the image 304-1 is unclear, the particular visibility of the activity may be lower in images with low general visibility.

Thus, activity visibility module 302-1 may determine activity visibility metric 306-1 based on imagery 304-1 (e.g., content of imagery 304-1) and activity of scene 104, where imagery 304-1 may be reflected in general visibility of imagery 304-1 (and in some cases, specific visibility of activity), and activity of scene 104 may affect the specific visibility of activity in imagery 304-1.

The activity visibility module 302 can determine the activity visibility metric 306 in any suitable manner. For example, one or more machine learning algorithms may be utilized to train a machine learning module configured to predict activity visibility metrics 306 based on the activity of imagery 304 and scene 104. Such machine learning algorithms and models are further described herein. Additionally or alternatively, the activity visibility module 302 may apply an activity recognition algorithm to the imagery 304 to recognize the activity of the scene 104. The activity recognition algorithm may also generate a confidence measure of the recognition of the activity, which may be used to determine an activity visibility metric 306. Additionally or alternatively, the activity visibility module 302 can receive information associated with the activity of the scene 104, which can be used to determine the activity visibility metric 306. Information associated with the activity of the scene 104 may be received from any suitable source(s) (e.g., robotic surgical system, user input, etc.) and may include any information related to the activity of the scene 104 and/or information related to the activity of the scene 104 that may be derived therefrom.

The activity visibility metric 306 may be output to an activity classifier 308, and the activity classifier 308 may determine an activity classification 310 based on the activity visibility metric 306 and the imagery 304. For example, activity classifier 308 may determine individual classifications of the activity of scene 104 based on images 304-1 and 304-2. The activity classifier 308 may then use the activity visibility metric 306 to weight individual classifications of the activities-using the activity visibility metric 306-1 as (or also) a confidence measure of the classification of the activity based on the image 304-1 and using the activity visibility metric 306-2 as (or also) a confidence measure of the classification of the activity based on the image 304-2. Based on the weighted classifications, the activity classifier 308 may determine an overall classification and output the overall classification as the activity classification 310. Additionally or alternatively, the activity classifier 308 may in some cases selectively utilize the activity visibility metric 306 for generating the activity classification 310 when individual classifications of the activity are different, and ignore the activity visibility metric 306 when the individual classifications of the activity are the same. Additionally or alternatively, the activity classifier 308 may utilize activity visibility metric thresholds to determine whether to utilize one or more activity visibility metrics. For example, if the activity visibility metric value 306 is below the activity visibility metric threshold, the activity classifier 308 may ignore the corresponding imagery 304 and/or decrease the weight of the corresponding imagery 304 for classifying the activity. Additionally or alternatively, the activity classifier 308 may utilize activity visibility metrics in any other suitable manner for determining the activity classification 310.

Further, the activity classifier 308 may determine and output a viewpoint adjustment 312 to facilitate adjusting a viewpoint of one or more of the imaging devices 102. For example, if the activity visibility metric value 306-1 is below the activity visibility metric threshold, the activity classifier 308 may facilitate adjusting the imaging device 102-1 to change a viewpoint of the imaging device 102-1 such that the imaging device 102-1 may capture additional imagery of the scene 104 from different viewpoints. For example, the activity classifier 308 may output a view adjustment 312, the view adjustment 312 including instructions to move the imaging device 102-1 relative to the scene 104 to change the view of the imaging device 102-1 so that the imaging device 102-1 captures additional imagery having a higher activity visibility metric value than imagery captured from the initial view.

The instructions to move the imaging device 102-1 may include instructions to physically move the imaging device 102-1 relative to the scene 104 in any suitable manner. For example, the imaging device 102-1 may include an articulating imaging device configured to articulate relative to the scene 104. In some examples, imaging device 102-1 may be hinged in that imaging device 102-1 is attached to a hinged support structure such that imaging device 102-1 correspondingly hinges when the hinged support structure is hinged. In some examples, the imaging device 102-1 is mounted to an articulating arm of a robotic system, such as a teleoperated robotic arm of a robotic system. In some examples, the imaging device 102-1 is mounted to an articulating support structure in a surgical apparatus, such as to an articulating imaging device boom (boom), a surgical cart, or other structure in a surgical apparatus. The view adjustment 312 may include an output configured for the structure of the imaging device 102-1. For example, the viewpoint adjustment 312 may include an output to the robotic system and/or the articulating support structure to instruct the robotic system and/or the articulating support structure to change the pose of the imaging device 102-1. Additionally or alternatively, the viewpoint adjustment 312 may include output to the user (e.g., on a screen, etc.) to instruct the user to change the pose of the imaging device 102-1.

In addition to or in lieu of imaging device 102-1 physically moving relative to scene 104, imaging device 102-1 may be considered to move relative to scene 104 in one or more other ways. In some embodiments, for example, movement of imaging device 102-1 may include any change to the point of view of imaging device 102-1. The change to the viewpoint may be caused by any suitable change to one or more parameters of the imaging device 102-1. As an example, the change in the zoom parameter changes the viewpoint of the imaging device 102-1. As another example, a change in the spatial position and/or orientation of the imaging device 102-1 changes the viewpoint of the imaging device 102-1. In such examples, the viewpoint adjustment 312 may include one or more parameters that are output to the imaging device 102-1 to change the imaging device 102-1. The viewpoint may be dynamically changed during the medical session (e.g., during any stage of the medical session, such as during pre-operative activity (e.g., setup activity), intra-operative activity, and/or post-operative activity).

In some illustrative examples, the multi-sensor architecture may include multiple imaging devices 102 mounted on different components of the robotic surgical system, with one or more components configured to articulate relative to an imaging scene and relative to one or more of the other components of the robotic surgical system. For example, the imaging device 102-1 may be mounted on an articulating or non-articulating component of the robotic system, and the imaging device 102-2 may be mounted on another articulating component of the robotic system.

In some illustrative examples, one or more imaging devices 102 of the multi-sensor architecture may be mounted on additional or alternative components of the surgical apparatus (such as other components in the operating room). For example, imaging device 102-1 may be mounted on an articulating or non-articulating component of the surgical device, and imaging device 102-2 may be mounted on another articulating component of the surgical device. As another example, the imaging device 102-1 may be mounted on an articulating component of a robotic system, and the imaging device 102-1 may be mounted on an articulating or non-articulating component of a surgical device.

As the processing system 106 may determine the activity visibility metrics 306 and provide viewpoint adjustments 312 in real-time, the processing system 106 may facilitate adjusting the imaging device 102 such that the imaging device 102 continuously provides the imagery 304 optimized for each activity of the scene 104. In some examples, the viewpoint adjustment 312 may include specific guidelines and/or directions as to where or how to move the imaging device 102 to improve visibility of the activity.

In some examples, the activity classifier 308 may include a generation module 314 that may be used to generate the generated imagery. The generated image may be based on the image 304 captured by the imaging device 102. As an example, another imaging device other than imaging devices 102-1 and 102-2 (not shown) may be capturing an image with an activity visibility metric value indicating that the current viewpoint of the other imaging device is suboptimal for activity visibility. The activity classifier 308 may utilize the generation module 314 to generate images based on the generation of the images 304-1 and 304-2. Using the imagery 304-1 and the imagery 304-2, the generated imagery may interpolate (inter), model, and/or predict how the scene 104 may be viewed from other viewpoints (e.g., viewpoints between the current viewpoints of the imaging devices 102-1 and 102-2). Based on the generated imagery, the activity classifier 308 may determine a generated activity visibility metric value for the generated imagery. The activity classifier 308 may then select a viewpoint having a better generated visibility metric and facilitate adjusting the other imaging devices to change the pose of the other imaging devices so that the other imaging devices may capture an image of the scene 104 from the selected viewpoint.

Additionally or alternatively, the activity classifier 308 may utilize the generation module 314 to generate imagery from the current point of view of the imaging device. For example, in some embodiments, an imaging device (e.g., imaging device 102-1) may be fixed in a particular location. The activity classifier 308 may determine that the activity visibility metric 306-1 of the imaging device 102-1 capturing the image 304-1 from the viewpoint of the fixed location is below an activity visibility metric threshold. In response, the activity classifier 308 may utilize the generation module 314 to generate an image from the perspective of the fixed location point of view but based on the generation of images from other imaging devices (e.g., imaging device 102-2, additional imaging devices not shown). The generated image may be used to supplement and/or replace the image 304-1 with images from other imaging devices that may have higher activity visibility metrics (e.g., at least a threshold value of activity visibility metrics and/or higher than activity visibility metrics 306-1). For example, the imaging device 102-1 may have a low activity visibility metric value due to objects occluding the view of the scene 104. The generated imagery produced by the generation module 314 may utilize imagery from other viewpoints to reconstruct the environment of the object and/or scene 104, as viewed from the viewpoint of the imaging device 102-1. The generated image and/or the image 304-1 supplemented with the generated image may be provided as an output to a user and/or further processed by the system 100.

The generation module 314 may be configured to generate the generated images in any suitable manner. For example, the generation module 314 may utilize one or more machine learning algorithms trained to generate imagery, such as generating a countermeasure network (GAN), or the like. Additionally or alternatively, the generation module 314 may interpolate images captured by other imaging devices from different viewpoints, generate modules based on images captured by other imaging devices, and generate images based on the modules, etc.

In some examples, the activity classifier 308 may determine an overall activity visibility metric value that presents the visibility of the activity of the scene 104 based on imagery captured by a plurality of imaging devices (e.g., all imaging devices 102 of a multi-view architecture). The activity classifier 308 may also base the view adjustment 312 on the overall activity visibility metric value. Thus, in some cases, the activity classifier 308 may facilitate the adjustment of a particular imaging device to a new viewpoint, which may result in a lower activity visibility metric value for the particular imaging device, but a higher overall activity visibility metric value.

Fig. 4 illustrates an example configuration 400 of a machine learning module 402 ("module 402") for a multi-view medical activity recognition system (e.g., "system 100"). While configuration 400 illustrates the training of module 402, which may be used by system 100 to determine activity visibility metrics and categorize activities, system 100 may additionally or alternatively utilize any suitable machine learning module trained in any suitable manner.

Configuration 400 shows a module 402 accessing an image 404 (e.g., images 404-1 through 404-N). The imagery 404 may be in the form of video clips, each of which includes a time-ordered series of images captured by an imaging device (e.g., imaging device 102-1). Each video clip may contain any suitable number (e.g., 16, 32, etc.) of frames (e.g., images).

The module 402 utilizes an activity recognition algorithm 406 (e.g., activity recognition algorithms 406-1 through 406-N) to extract features of the respective video clips to determine the activity of the scene captured in the video clips. The activity recognition algorithm 406 may be implemented by any suitable algorithm or algorithms, such as a fine-tuned I3D model or any other neural network or other algorithm.

The activity recognition algorithms 406 each provide an output to a classifier 408, the classifier 408 being configured to receive the output of the activity recognition algorithms 406 for the plurality of video segments of the image 404. Thus, classifier 408 utilizes features extracted from multiple video segments to identify activity in each video segment. In some examples, configuration 400 may utilize classifier 408 to train module 402 during implementation of module 402, but does not rely on or include classifier 408, allowing module 402 to identify activity in an independent video segment in real-time.

Classifier 408 may output the first classification of each video segment to a corresponding Long Short Term Memory (LSTM) algorithm 410 (e.g., LSTM algorithms 410-1 through 410-N). The LSTM algorithms 410 may each be configured to process a corresponding video segment while also communicating with other LSTM algorithms 410 (e.g., LSTM algorithms for a previous video segment and a subsequent video segment). Each LSTM algorithm 410 may process the video clip to also extract features for activity recognition of the scene captured by the video clip. LSTM algorithm 410 may output features to classifier 412 (e.g., classifiers 412-1 through 412-N).

Classifier 412 may receive the features extracted by LSTM algorithm 410 to identify the activity captured in the corresponding video segment. Classifier 412 may also receive the first classification of the video segments generated by classifier 408 and base the classification of the video segments thereof at least in part on the first classification. Based on the features extracted by LSTM algorithm 410 and the first classification generated by classifier 408, classifier 412 may output a final classification of the activity captured by the corresponding video segment.

The final classification may be a selection of one or more predefined activities 414 associated with the medical session captured by the imagery 404. For example, the final classification may be a one-dimensional vector having a length corresponding to a number of predefined activities for the medical session. Each vector may have a value corresponding to a probability of each of the activities as identified in the respective video clip.

The final classification may be provided to two layers of regression algorithms 416 and 418 (e.g., regression algorithms 416-1 through 416-N and regression algorithms 418-1 through 418-N). The regression algorithms 416 and 418 may be configured to generate an activity visibility score based on the corresponding video clip. In some examples, the activity visibility score may be further based on a final classification of the respective video segment determined by the classifier 412. In other examples, the activity visibility score may be determined independent of the final classification.

The module 402 may be trained in any suitable manner. For example, a supervised learning algorithm may be utilized to train the module 402 end-to-end. The training data set may include video clips marked with activity of a scene captured in the video clips, and activity visibility metrics for activity based on imagery of the video clips. The active identification tag may be verified by the user and/or other synchronized imagery from other imaging devices. The activity visibility metric tag may be provided by a user. Based on such labeled data sets, module 402 may learn to receive images as input and predict activity visibility metrics based on images and activities, as well as predict activity classifications. The prediction of the activity visibility metric may be generated based on and/or independent of the activity classification prediction.

Fig. 5 illustrates an example computer-assisted robotic surgical system 500 ("surgical system 500") associated with system 100. The system 100 may be implemented by the surgical system 500, connected to the surgical system 500, and/or otherwise used in conjunction with the surgical system 500. For example, system 100 may be implemented by one or more components of surgical system 500 (such as a steering system, a user control system, or an auxiliary system). As another example, system 100 may be implemented by a stand-alone computing system communicatively coupled to a computer-assisted surgery system.

As shown, surgical system 500 may include a manipulation system 502, a user control system 504, and an auxiliary system 506 communicatively coupled to each other. The surgical team may utilize the surgical system 500 to perform a computer-assisted surgical procedure on the patient 508. As shown, the surgical team may include a surgeon 510-1, an assistant 510-2, a nurse 510-3, and an anesthesiologist 510-4, all of which may be collectively referred to as a "surgical team member 510". Additional or alternative surgical team members may be present during the surgical session.

While fig. 5 illustrates an ongoing minimally invasive surgical procedure, it should be appreciated that the surgical system 500 may similarly be used to perform an open surgical procedure or other types of surgical procedures that may similarly benefit from the accuracy and convenience of the surgical system 500. In addition, it will be appreciated that medical sessions such as surgical sessions, which may be used throughout the procedure with the surgical system 500, may include not only intraoperative phases of the surgical procedure (as shown in fig. 5), but may also include preoperative (which may include setup of the surgical system 500), postoperative, and/or other suitable phases of the surgical session.

As shown in FIG. 5, manipulation system 502 can include a plurality of manipulator arms 512 (e.g., manipulator arms 512-1 through 512-4) to which a plurality of surgical instruments can be coupled. Each surgical instrument may be implemented by: any suitable surgical tool (e.g., a tool with tissue interaction functionality), medical tool, imaging device (e.g., an endoscope, ultrasonic tool, etc.), sensing instrument (e.g., a force sensing surgical instrument), diagnostic instrument, or other instrument that may be used in a computer-assisted surgical procedure on patient 508 (e.g., by being at least partially inserted into patient 508 and manipulated to perform a computer-assisted surgical procedure on patient 508). Although manipulation system 502 is depicted and described herein as including four manipulator arms 512, it will be appreciated that manipulation system 502 may include only a single manipulator arm 512 or any other number of manipulator arms that may serve a particular implementation.

Manipulator arm 512 and/or a surgical instrument attached to manipulator arm 512 may include one or more displacement transducers, orientation sensors, and/or position sensors for generating raw (i.e., uncorrected) kinematic information. One or more components of the surgical system 500 may be configured to utilize kinematic information to track (e.g., determine pose) and/or control a surgical instrument, as well as anything connected to the instrument and/or arm. As described herein, the system 100 can utilize kinematic information to track components of the surgical system 500 (e.g., the manipulator arm 512 and/or a surgical instrument attached to the manipulator arm 512).

User control system 504 may be configured to facilitate control of manipulator arm 512 and a surgical instrument attached to manipulator arm 512 by surgeon 510-1. For example, surgeon 510-1 may interact with user control system 504 to remotely move or manipulate manipulator arm 512 and the surgical instrument. To this end, the user control system 504 may provide an image (e.g., a high definition 3D image) of the surgical site associated with the patient 508 captured by the imaging system (e.g., endoscope) to the surgeon 510-1. In some examples, user control system 504 may include a stereoscopic viewer having two displays, wherein a surgical site associated with patient 508 and a stereoscopic image generated by the stereoscopic imaging system may be viewed by surgeon 510-1. The surgeon 510-1 may utilize the images displayed by the user control system 504 to perform one or more procedures with one or more surgical instruments attached to the manipulator arm 512.

To facilitate control of the surgical instrument, the user control system 504 may include a set of master controllers. These master controllers may be manipulated by the surgeon 510-1 to control movement of the surgical instrument (e.g., by utilizing robotic and/or teleoperational techniques). The master controller may be configured to detect a wide variety of hand, wrist, and finger movements of the surgeon 510-1. In this manner, surgeon 510-1 may intuitively perform a procedure using one or more surgical instruments.

The auxiliary system 506 may include one or more computing devices configured to perform processing operations of the surgical system 500. In such a configuration, one or more computing devices included in auxiliary system 506 may control and/or coordinate operations performed by various other components of surgical system 500 (e.g., manipulation system 502 and user control system 504). For example, computing devices included in user control system 504 may communicate instructions to manipulation system 502 via one or more computing devices included in auxiliary system 506. As another example, the auxiliary system 506 may receive and process image data representing imagery captured by one or more imaging devices attached to the manipulation system 502.

In some examples, the assistance system 506 may be configured to present visual content to the surgical team member 510 that may not have access to the images provided to the surgeon 510-1 at the user control system 504. To this end, the assistance system 506 may include a display monitor 514, the display monitor 514 configured to display one or more user interfaces, such as images of the surgical site, information associated with the patient 508 and/or the surgical procedure, and/or any other visual content that may serve a particular implementation. For example, display monitor 514 may display an image of the surgical site, along with additional content (e.g., graphical content, contextual information, etc.) that is displayed concurrently with the image. In some implementations, display monitor 514 is implemented by a touch screen display that surgical team member 510 may interact with (e.g., via touch gestures) to provide user input to surgical system 500.

Manipulation system 502, user control system 504, and auxiliary system 506 may be communicatively coupled to one another in any suitable manner. For example, as shown in FIG. 5, manipulation system 502, user control system 504, and auxiliary system 506 may be communicatively coupled via control line 516, and control line 516 may represent any wired or wireless communication link that may serve a particular implementation. To this end, manipulation system 502, user control system 504, and assistance system 506 may each comprise one or more wired or wireless communication interfaces, such as one or more local area network interfaces, wi-Fi network interfaces, cellular interfaces, and the like.

In some examples, an imaging device, such as imaging device 102, may be attached to a component of surgical system 500 and/or a component of a surgical apparatus in which surgical system 500 is disposed. For example, the imaging device may be attached to a component of the manipulation system 502.

Fig. 6 depicts an illustrative configuration 600 of imaging device 102 (imaging devices 102-1 through 102-4) attached to components of manipulation system 502. As shown, imaging device 102-1 may be attached to an Orientation Platform (OP) 602 of manipulation system 502, imaging device 102-2 may be attached to manipulator arm 512-1 of manipulation system 502, imaging device 102-3 may be attached to manipulator arm 512-4 of manipulation system 502, and imaging device 102-4 may be attached to a base 604 of manipulation system 502. The imaging device 120-1 attached to the OP 602 may be referred to as an OP imaging device, the imaging device 120-2 attached to the manipulator arm 512-1 may be referred to as a universal set manipulator 1 (USM 1) imaging device, the imaging device 120-3 attached to the manipulator arm 512-4 may be referred to as a universal set manipulator 4 (USM 4) imaging device, and the imaging device 120-4 attached to the BASE 604 may be referred to as a BASE imaging device, or BASE imaging device. In implementations where the manipulation system 502 is positioned proximate to the patient (e.g., as a patient side cart), placement of the imaging device 502 at a strategic location on the manipulation system 502 provides a favorable imaging viewpoint proximate to the patient and a surgical procedure performed on the patient.

In some implementations, components of manipulation system 502 (or other robotic systems in other examples) may have redundant degrees of freedom that allow multiple configurations of components to reach the same output position of an end effector (e.g., instrument connected to manipulator arm 512) attached to the components. Thus, the processing system 106 can direct movement of the components of the manipulation system 502 without affecting the position of the end effector attached to the components. This may allow repositioning of the component to perform activity recognition without changing the position of the end effector attached to the component.

The illustrative placement of imaging device 102 to components of manipulation system 502 is exemplary. Additional and/or alternative positioning of any suitable number of imaging devices 102, other components of surgical system 500, and/or other components at a surgical device on manipulation system 502 may be utilized in other implementations. Imaging device 102 may be attached to components of manipulation system 502, other components of surgical system 500, and/or other components at a surgical instrument in any suitable manner.

Fig. 7 illustrates an example method 700 of a multi-view medical activity recognition system. While fig. 7 illustrates example operations according to one embodiment, other embodiments may omit, add, reorder, combine, and/or modify any of the operations illustrated in fig. 7. One or more of the operations shown in fig. 7 may be performed by an activity recognition system, such as system 100, any components included therein, and/or any implementation thereof.

In operation 702, the activity recognition system may access an image of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the image including a first image captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints. Operation 702 may be performed in any of the ways described herein.

In operation 704, the activity recognition system may determine an activity visibility metric of the first sensor during the medical session and based on the first image. Operation 704 may be performed in any manner described herein.

In operation 706, the activity recognition system may facilitate adjusting the first viewpoint of the first sensor based on the activity visibility metric value. Operation 706 may be performed in any of the manners described herein.

Fig. 8 illustrates an example method 800 of a multi-view medical activity recognition system. While fig. 8 illustrates example operations according to one embodiment, other embodiments may omit, add, reorder, combine, and/or modify any of the operations illustrated in fig. 8. One or more of the operations shown in fig. 8 may be performed by an activity recognition system, such as system 100, any components included therein, and/or any implementation thereof.

In operation 802, the activity recognition system may access an image of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the image including a first image captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints. Operation 802 may be performed in any of the ways described herein.

In operation 804, the activity recognition system may determine a first classification of an activity of the scene based on the first imagery. Operation 804 may be performed in any manner described herein.

In operation 806, the activity recognition system may determine an activity visibility metric of the first sensor during the medical session and based on the first image. Operation 806 may be performed in any manner described herein.

In operation 808, the activity recognition system may determine that the activity visibility metric value of the first sensor is below an activity visibility metric threshold. Operation 808 may be performed in any manner described herein.

In operation 810, the activity recognition system may reduce a weight of a first classification of activities of the scene for determining an overall classification of activities of the scene based on the imagery of the scene. Operation 810 may be performed in any of the ways described herein.

The multi-view medical activity recognition principles, systems, and methods described herein may be used in a variety of applications. As an example, one or more activity recognition aspects described herein may be used to conduct surgical workflow analysis in real-time or retrospectively. As another example, one or more activity recognition aspects described herein may be used for automatic transcription of surgical sessions (e.g., for documentation, further planning, and/or resource allocation purposes). As another example, one or more of the activity recognition aspects described herein may be used for automation of surgical subtasks. As another example, one or more of the activity recognition aspects described herein may be used for computer-aided setup of a surgical system and/or surgical device (e.g., one or more operations of a robotic surgical system may be automated based on perception of a surgical scene and automatic movement of the robotic surgical system). These examples of applications of the activity recognition principles, systems, and methods described herein are exemplary. The activity recognition principles, systems, and methods described herein may be implemented for other suitable applications.

In some examples, a non-transitory computer-readable medium storing computer-readable instructions may be provided in accordance with the principles described herein. The instructions, when executed by a processor of a computing device, may direct the processor and/or the computing device to perform one or more operations, including one or more operations described herein. Such instructions may be stored and/or transmitted using any of a variety of known computer-readable media.

A non-transitory computer-readable medium as referred to herein may include any non-transitory storage medium that participates in providing data (e.g., instructions) that may be read and/or executed by a computing device (e.g., by a processor of the computing device). For example, a non-transitory computer readable medium may include, but is not limited to, any combination of non-volatile storage media and/or volatile storage media. Illustrative non-volatile storage media include, but are not limited to, read-only memory, flash memory, solid state drives, magnetic storage devices (e.g., hard disks, floppy disks, tape, etc.), ferroelectric random access memory ("RAM"), and optical disks (e.g., compact disks, digital video disks, blu-ray disks, etc.). Illustrative volatile storage media include, but are not limited to, RAM (e.g., dynamic RAM).

Fig. 9 illustrates an example computing device 900 that can be specifically configured to perform one or more of the processes described herein. Any of the systems, units, computing devices, and/or other components described herein may be implemented or realized by computing device 900.

As shown in fig. 9, computing device 900 may include a communication interface 902, a processor 904, a storage 906, and an input/output ("I/O") module 908 communicatively connected to each other via a communication infrastructure 910. Although the example computing device 900 is illustrated in fig. 9, the components illustrated in fig. 9 are not intended to be limiting. Additional or alternative components may be utilized in other embodiments. The components of computing device 900 shown in fig. 9 will now be described in more detail.

The communication interface 902 may be configured to communicate with one or more computing devices. Examples of communication interface 902 include, but are not limited to, a wired network interface (such as a network interface card), a wireless network interface (such as a wireless network interface card), a modem, an audio/video connection, and any other suitable interface.

Processor 904 generally represents any type or form of processing unit capable of processing data and/or interpreting, executing, and/or directing the execution of one or more of the instructions, processes, and/or operations described herein. The processor 904 may perform operations by executing computer-executable instructions 912 (e.g., applications, software, code, and/or other executable data examples) stored in the storage 906.

Storage 906 may include one or more data storage media, devices, or configurations, and may take any type, form of data storage media, and/or combination thereof. For example, storage 906 may include, but is not limited to, any combination of the non-volatile media and/or volatile media described herein. Electronic data, including data described herein, may be temporarily and/or permanently stored in storage 906. For example, data representing computer-executable instructions 912 configured to direct processor 904 to perform any of the operations described herein may be stored within storage 906. In some examples, the data may be arranged in one or more databases residing within the storage 906.

The I/O modules 908 may include one or more I/O modules configured to receive user input and provide user output. The I/O module 908 may include any hardware, firmware, software, or combination thereof that supports input and output capabilities. For example, the I/O module 908 may include hardware and/or software for capturing user input, including but not limited to a keyboard or keypad, a touch screen component (e.g., a touch screen display), a receiver (e.g., an RF or infrared receiver), a motion sensor, and/or one or more input buttons.

The I/O module 908 may include one or more devices for presenting output to a user, including, but not limited to, a graphics engine, a display (e.g., a display screen), one or more output drivers (e.g., a display driver), one or more audio speakers, and one or more audio drivers. In some implementations, the I/O module 908 is configured to provide graphical data to a display for presentation to a user. The graphical data may represent one or more graphical user interfaces and/or any other graphical content that may serve a particular implementation.

In some examples, any of the systems, modules, and/or apparatus described herein may be implemented by or within one or more components of computing device 900. For example, one or more applications 912 present within the storage 906 may be configured to direct the implementation of the processor 904 to perform one or more operations or functions associated with the processing system 108 of the system 100.

As mentioned, one or more operations described herein may be performed during a medical session, e.g., dynamically, in real-time, and/or near real-time. As used herein, an operation described as occurring "in real time" will be understood to be performed immediately and without excessive delay, even though absolute zero delay is not possible.

Any of the systems, devices, and/or components thereof may be implemented in any suitable combination or sub-combination. For example, any of the systems, devices, and/or components thereof may be implemented as a device configured to perform one or more of the operations described herein.

In the description herein, various illustrative embodiments have been described. It will, however, be evident that various modifications and changes may be made thereto, and additional embodiments may be implemented, without departing from the scope of the invention as set forth in the appended claims. For example, certain features of one embodiment described herein may be combined with or substituted for features of another embodiment described herein. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims

1. A system, comprising:

A memory storing instructions;

a processor communicatively coupled to the memory and configured to execute the instructions to:

accessing imagery of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the imagery including a first imagery captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints;

determining an activity visibility metric value of the first sensor during the medical session and based on the first image; and

facilitating adjusting the first view point of the first sensor based on the activity visibility metric value.

2. The system of claim 1, wherein the facilitating adjusting the first viewpoint of the first sensor comprises providing an output to a robotic system to instruct the robotic system to change a pose of the first sensor.

3. The system of claim 1, wherein the facilitating adjusting the first viewpoint of the first sensor comprises providing an output to a user to instruct the user to change a pose of the first sensor.

4. The system of claim 1, wherein:

The instructions include a machine learning module that trains based on a training image labeled with activity of a scene captured in the training image; and

the determining the activity visibility metric value of the first sensor includes utilizing the machine learning module.

5. The system of claim 1, wherein the processor is further configured to execute the instructions to:

accessing an additional image of the scene of the medical session captured by the plurality of sensors from another plurality of viewpoints, the additional image comprising a second image captured by the first sensor from a second viewpoint different from the first viewpoint; and

an additional value of the activity visibility metric that is higher than the activity visibility metric value is determined based on the additional image.

6. The system of claim 1, wherein:

the image of the scene includes:

a second image captured by a second sensor of the plurality of sensors from a second viewpoint of the plurality of viewpoints, and

a third image captured by a third sensor of the plurality of sensors from a third viewpoint of the plurality of viewpoints; and

The processor is further configured to execute the instructions to:

determining that the activity visibility metric value of the first sensor is below an activity visibility metric threshold, and

based on the determining that the activity visibility metric value of the first sensor is below the activity visibility metric threshold, a generation module is utilized to generate a generated image based on the second image and the third image.

7. The system of claim 6, wherein the generated imagery comprises imagery generated based on the first view point, the generated imagery having a generated activity visibility metric value that is higher than the activity visibility metric value of the first sensor.

8. The system of claim 6, wherein:

the generated imagery includes imagery generated based on a fourth view point, the generated imagery having a generated activity visibility metric value that is higher than the activity visibility metric value of the first sensor; mention is made of

The facilitating adjusting the first view of the first sensor includes providing an output including instructions to change a pose of the first sensor to capture additional imagery of the scene from the fourth view.

9. The system of claim 6, wherein:

the processor is further configured to execute the instructions to:

determining the activity visibility metric value of the second sensor based on the second image,

determining the activity visibility metric value of the third sensor based on the third image; and

the activity visibility metric value generated by the generation module to generate a generated image based on the second image and the third image is further based on the second sensor and the third sensor is at least the activity visibility metric threshold.

10. The system of claim 1, wherein:

the imagery of the scene includes a second imagery captured by a second sensor of the plurality of sensors from a second viewpoint of the plurality of viewpoints;

the processor is further configured to execute the instructions to determine an overall activity visibility metric value for the plurality of sensors based on the first image and the second image; and

the facilitating adjusting the first view of the first sensor includes adjusting the first view to improve the overall activity visibility metric.

11. The system of claim 10, wherein the facilitating adjusting the first viewpoint results in a lower activity visibility metric value for the first sensor and a higher overall activity visibility metric value for the plurality of sensors.

12. The system of claim 1, wherein the scene of the medical session is within a patient.

13. The system of claim 1, wherein the activity visibility metric represents a ranking of a degree of visibility of an activity of the scene in the first imagery.

14. A system, comprising:

a memory storing instructions;

determining a first classification of activity of the scene based on the first imagery;

determining an activity visibility metric value of the first sensor during the medical session and based on the first image;

determining that the activity visibility metric value of the first sensor is below an activity visibility metric threshold; and

based on the determining that the activity visibility metric value of the first sensor is below the activity visibility metric threshold, the first classification of the activity of the scene is weighted down to determine an overall classification of the activity of the scene based on the imagery of the scene.

15. The system of claim 14, wherein the processor is further configured to execute the instructions to facilitate adjusting the first viewpoint of the first sensor based on the determining that the activity visibility metric value of the first sensor is below the activity visibility metric threshold.

16. A method, comprising:

accessing, by a processor, an image of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the image comprising a first image captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints;

determining, by the processor, an activity visibility metric of the first sensor during the medical session and based on the first image; and

facilitating, by the processor, adjustment of the first viewpoint of the first sensor based on the activity visibility metric value.

17. The method of claim 16, wherein the facilitating adjusting the first viewpoint of the first sensor comprises providing an output to a robotic system to instruct the robotic system to change a pose of the first sensor.

18. The method of claim 16, wherein the facilitating adjusting the first viewpoint of the first sensor comprises providing an output to a user to indicate the user to change a pose of the first sensor.

19. The method of claim 16, wherein the determining the activity visibility metric value of the first sensor comprises utilizing a machine learning module that trains based on the training images labeled with activity of a scene captured in training images.

20. The method of claim 16, further comprising:

accessing, by the processor, an additional image of the scene of the medical session captured by the plurality of sensors from another plurality of viewpoints, the additional image comprising a second image captured by the first sensor from a second viewpoint different from the first viewpoint; and

an additional activity visibility metric value is determined by the processor that is higher than the activity visibility metric value based on the additional imagery.

21. The method according to claim 16, wherein:

the image of the scene includes:

the method further comprises the steps of:

Determining, by the processor, that the activity visibility metric value of the first sensor is below an activity visibility metric threshold, and

generating, by the processor and with a generation module, a generated image based on the second image and the third image based on the determining that the activity visibility metric value of the first sensor is below the activity visibility metric threshold.

22. The method of claim 21, wherein the generated imagery comprises imagery generated based on the first view point, the generated imagery having a generated activity visibility metric value that is higher than the activity visibility metric value of the first sensor.

23. The method according to claim 21, wherein:

the generated imagery includes imagery generated based on a fourth view point, the generated imagery having a generated activity visibility metric value that is higher than the activity visibility metric value of the first sensor; and

24. The method of claim 21, further comprising:

determining, by the processor, an activity visibility metric value for the second sensor based on the second image; and

determining by the processor an activity visibility metric value for the third sensor based on the third image,

wherein the generating, with the generating module, the generated image based on the second image and the third image is further based on the activity visibility metric value of the second sensor and the third sensor being at least the activity visibility metric threshold.

25. The method according to claim 16, wherein:

the method further includes determining, by the processor, an overall activity visibility metric value for the plurality of sensors based on the first image and the second image; and

26. The method of claim 25, wherein the facilitating adjusting the first viewpoint results in a lower activity visibility metric value for the first sensor and a higher overall activity visibility metric value for the plurality of sensors.

27. The method of claim 16, wherein the scene of the medical session is within a patient.

28. A method, comprising:

determining, by the processor, a first classification of activity of the scene based on the first image;

determining, by the processor, an activity visibility metric of the first sensor during the medical session and based on the first image;

determining, by the processor, that the activity visibility metric value of the first sensor is below an activity visibility metric threshold; and

based on the determining that the activity visibility metric value of the first sensor is below the activity visibility metric threshold, the first classification of the activity of the scene is weighted down by the processor for determining an overall classification of the activity of the scene based on the imagery of the scene.

29. The method of claim 28, further comprising:

Based on the determining that the activity visibility metric value of the first sensor is below the activity visibility metric threshold, adjusting the first viewpoint of the first sensor is facilitated by the processor.

30. A non-transitory computer-readable medium storing instructions executable by a processor to:

31. A non-transitory computer-readable medium storing instructions executable by a processor to: