WO2023063940A1 - Region of interest cropped images - Google Patents

Region of interest cropped images Download PDF

Info

Publication number
WO2023063940A1
WO2023063940A1 PCT/US2021/054736 US2021054736W WO2023063940A1 WO 2023063940 A1 WO2023063940 A1 WO 2023063940A1 US 2021054736 W US2021054736 W US 2021054736W WO 2023063940 A1 WO2023063940 A1 WO 2023063940A1
Authority
WO
WIPO (PCT)
Prior art keywords
roi
image
examples
size
captured
Prior art date
Application number
PCT/US2021/054736
Other languages
French (fr)
Inventor
Qian Lin
Tianqi GUO
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to PCT/US2021/054736 priority Critical patent/WO2023063940A1/en
Publication of WO2023063940A1 publication Critical patent/WO2023063940A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/147Determination of region of interest
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals

Definitions

  • a computing device allows a user to utilize computing device operations for work, education, gaming, multimedia, and/or other uses.
  • Computing devices are utilized in a non-portable setting, such as at a desktop.
  • the computing device allows a user to carry or otherwise bring the computing device along while in a mobile setting.
  • These computing devices can be connected to scanner devices, cameras, and/or other image capture devices to capture images of different areas or physical devices.
  • Figure 1 illustrates an example of a device for region of interest cropped images.
  • Figure 2 illustrates an example of a memory resource storing instructions for region of interest cropped images.
  • Figure 3 illustrates an example of a device for region of interest cropped images.
  • Figure 4 illustrates an example of method for region of interest cropped images.
  • a user may utilize a computing device for various purposes, such as for business and/or recreational use.
  • the term “computing device” refers to an electronic system having a processor resource and a memory resource.
  • Examples of computing devices can include, for instance, a laptop computer, a notebook computer, a desktop computer, an all-in-one (AIO) computer, networking device (e.g., router, switch, etc.), and/or a mobile device (e.g., a smart phone, tablet, personal digital assistant, smart glasses, a wrist-worn device such as a smart watch, etc.), among other types of computing devices.
  • a mobile device refers to devices that are (or can be) carried and/or worn by a user.
  • computing devices are utilized as teleconference devices.
  • a teleconference device is utilized to provide audio and/or video data to remote computing devices.
  • a teleconference device is a computing device that communicates with remote computing devices and allows remote users to communicate through audio and/or video data transferred between the plurality of computing devices.
  • a plurality of computing devices can be utilized for a teleconference by connecting to a teleconference application.
  • the teleconference application includes instructions that are utilized to receive audio and/or video data from the plurality of computing devices and provide the audio and/or video data to each of the plurality of computing devices.
  • the teleconference application includes a teleconference portal that is utilized by a plurality of computing devices to exchange audio and/or video data.
  • a teleconference portal refers to a gateway for a website that provides teleconferencing functions.
  • a controller can intercept images captured by an imaging device (e.g., camera, video camera, etc ).
  • the imaging device utilizes a field-programmable gate array (FGPA) machine learning hardware.
  • the FGPA machine learning hardware can be customized for a particular architecture (e.g., convolutional neural network, etc.) to perform the functions described herein.
  • the controller or computing device is able to intercept images captured by an imaging device or video imaging device. The intercepted images can be altered before being provided to an application (e.g., teleconference application, social media application, etc.) such that another user utilizing the application views the altered image and not the original image captured by the imaging device.
  • an application e.g., teleconference application, social media application, etc.
  • an imaging device or video imaging device is adjustable to focus on a particular area or particular object within a viewable area.
  • the frame of view of an imaging device can be adjusted to capture a smaller area or a larger area.
  • the imaging device may be positioned at a particular location that may be adjusted to focus on a particular object or person and then be adjusted to focus on a different object or person.
  • a lens or digital setting may have to be adjusted to focus on the particular region of interest at a particular time.
  • the adjustment of the imaging device may take time and/or a region of interest may move within the viewable area, which may need additional adjusting to focus on the area of interest.
  • the present disclosure relates to generating region of interest (ROI) cropped images.
  • the ROI is detected utilizing an ROI detection method.
  • the ROI includes a human user for a particular application or particular setting.
  • the ROI detection method is a human user detection method that provides coordinates for a human user within the viewable area of the imaging device.
  • the coordinates are provided to the computing device to alter the image based on the provided coordinates.
  • the coordinates may provide a boundary box that surrounds a portion of the user (e.g., head and shoulders, head to waist, head to feet, etc.) and the computing device crops the image based on the coordinates.
  • the cropped portion of a captured image may not have a relatively high quality or visual quality since the cropped portion could be a relatively small portion of the original size of the image.
  • the computing device increases the size of the cropped portion to the original size of the image and performs a machine learning super resolution method on the cropped portion to increase the quality of the image.
  • the image is sent or provided to an application or remote device, in this way, the application may not receive the original image and only receive the altered cropped portion of the image from the computing device. This allows a user of the imaging device and/or computing device to provide high quality images of the ROI without having to alter the settings or lens of the imaging device.
  • Figure 1 illustrates an example of a computing device 102 for region of interest cropped images.
  • the computing device 102 includes a processor 104 and a memory resource 106 to store instructions that are executed by the processor 104.
  • the computing device 102 includes a processor 104 and a memory resource 106 storing instructions 108, 110, 112, 114, 116, that can be executed by the processor 104 to perform particular functions.
  • the computing device 102 is communicatively coupled to an imaging device 120 through a communication path 118. in some examples, the communication path 118 allows the computing device 102 to send and receive signals (e.g., communication signals, electrical signals, etc.) with the imaging device 120.
  • the imaging device 120 is capable of capturing an image of an area.
  • the imaging device 120 is a web cam or video imaging device that is capable of capturing a plurality of image frames that generate a video.
  • the plurality of images are provided to an application and sent to a remote device or remote display device.
  • the images captured by the imaging device 120 are intercepted by the computing device 102 before being provided to the remoted device or remote display device. In this way, an altered image of a region of interest (ROI) is provided to the remote device without providing the original image captured by the imaging device 120.
  • ROI region of interest
  • the computing device 102 includes instructions 108 stored by the memory resource 106 that is executed by the processor 104 to receive an image captured by the imaging device 120.
  • the image captured by the imaging device 120 can be a still image, video image, infrared image, or other type of image that is provided through a communication path 118.
  • the imaging device 120 may be capturing video images of an area where objects or human user may be stationary or moving through the area.
  • the area captured by the imaging device 120 is a viewing area.
  • the captured image from the imaging device 120 is referred to as an original image or an image that has not been altered by the computing device 102.
  • the captured image is a portion of frames from a plurality of frames captured by the imaging device 120.
  • the imaging device 120 can include a video imaging device that captures a plurality of frames or a video.
  • the captured image from the imaging device 120 includes a frame or a portion of the frames captured by the imaging device 120.
  • a portion of the plurality of frames are utilized to identify the ROI within the video or total plurality of frames.
  • the computing device 102 includes instructions 110 stored by the memory resource 106 that is executed by the processor 104 to identify a region of interest (ROI) within the captured image of an area based on a selected ROI method.
  • ROI is a portion of the captured image that is identified as important or identified to be highlighted within the viewing area of the captured image.
  • the ROI is a portion of the image that is selected utilizing an ROi method.
  • an ROI method includes, but is not limited to: a human user identification method, a text identification method, an object identification method, among other methods for identifying what is a ROI within the captured image.
  • a ROI method is selected by a user when utilizing the imaging device 120.
  • the user may indicate how they intend to utilize the imaging device 120.
  • the user can select a ROI method associated with a teleconference.
  • the ROI method is utilized to identify a human user within the viewing area and identifying a boundary box that surrounds the human user and/or a particular portion of the human user.
  • the ROI method can allow a boundary box to be manually selected by a user.
  • the boundary box that is identified by the ROI method or by the user can be utilized to determine coordinates within the image that define the boundary box.
  • the ROI method allows for multiple human users to be the ROI at different times during a video presentation.
  • the ROI method can be a gesture ROI method that allows a user to make gesture to be detected as the ROI for a time period.
  • the user will be the ROI of the video until a different user makes the gesture.
  • the different user will be the ROI for a time period after making the gesture.
  • a first user that makes the gesture can be the ROI for a first time period and the ROI can switch to a second user when the second user makes the gesture.
  • a gesture can include a hand gesture (e.g., wave, finger point, etc.) or other type of gesture to notify the gesture ROI method to select the user as the ROI.
  • the ROI method includes a voice recognition ROI method.
  • the voice recognition ROI method can identify that the first user is speaking and make the first user the ROI of the video image.
  • the voice recognition ROI method identifies the second user and switches the ROi from the first user to the second user.
  • the computing device 102 includes instructions to determine coordinates of the ROI based on the selected ROI method. In some examples, the computing device 102 includes instructions to crop the identified ROi portion based on coordinates within the captured image identified by the selected ROI method. For example, the determined coordinates can be a border that surrounds the ROI that is to be utilized for providing to a remote device and/or an application that is being provided with an image captured by the imaging device 120. [0022] The computing device 102 includes instructions 112 stored by the memory resource 106 that is executed by the processor 104 to crop the identified ROI portion from the captured image.
  • the computing device 102 crops or removes a portion of the original image captured by the imaging device 120 based on the coordinates of the boundary box associated with the ROI. In some examples, the computing device 102 removes a portion of the original image from an edge of the original image to the coordinates of the boundary box of the ROI. For example, the coordinates of the ROI can identify a box or shape that encloses or surrounds the ROI. In these examples, the computing device 102 remove the portions of the original image that are not within the boundary box or shape that encloses the ROI.
  • the ROI method includes one of: a face recognition method, a human user detection method, and text recognition method.
  • the face recognition ROI identifies a face or identifies an identity of a human user within the original image captured by the imaging device 120.
  • the face recognition ROI method identifies an area that surrounds a face or portion of a user that includes the face of the user.
  • the face recognition ROI method is utilized to identify a face of the user and generates a boundary box that includes the face of the user and a portion of the user’s body.
  • the portion of the user’s body is defined by the computing device 102 and adjusted based on the location of the face of the user.
  • the human user detection ROI method includes a method for identifying whether a human user is within the original image captured by the imaging device 120.
  • the human user detection ROI method utilizes the identified human user to generate a boundary box that includes coordinates for cropping the original image based on the location of the human user within the original image.
  • the human user detection ROI method identifies a boundary around the human user or a portion of the human user based on an identified portion of the human user (e.g., face, head, center, etc.).
  • the text detection ROI method includes a method for identifying text within the original image captured by the imaging device 120.
  • the text detection ROI method identifies text on a piece of paper or other media and defines a boundary box or perimeter that surrounds the identified text within the original image.
  • the computing device 102 includes instructions to receive a boundary box applied to the captured image and identify an area within the boundary box as the ROI.
  • the ROI portion of the image is defined by a boundary box or perimeter shape that encloses the ROI portion of the image. In this way, the portions that are not part of the ROI portion of the image are removed or cropped from the image when the image is cropped utilizing the boundary box.
  • the computing device 102 includes instructions to alter a size of the cropped ROI portion to an original size of the captured image.
  • the cropped ROI portion of the image is smaller than the size of the original image.
  • the original image is generated to be utilized by an application. For this reason, the computing device 102 may alter the size of the cropped ROI portion to match the size of the original image such that the cropped ROI portion can be utilized by the same application.
  • increasing the size of the ROI portion may degrade the image quality of the ROI portion.
  • increasing the size of an image may increase a blurriness or distortion of the image.
  • the computing device 102 includes instructions 114 stored by the memory resource 106 that is executed by the processor 104 to apply machine learning super-resolution to the cropped ROI portion
  • the machine learning super-resolution method can refer to a class of techniques that can enhance (e.g., increase) a resolution of a subject utilizing a plurality of images of the subject.
  • the diffraction limit of systems can be transcended, while in geometrical super-resolution restoration the resolution of a digital imaging sensor can be enhanced.
  • the machine learning super-resolution method can include a multiple-frame superresolution restoration method.
  • a multiple-frame super-resolution restoration subpixel shifts between a plurality of images of the same scene or subject can be utilized.
  • the plurality of images can be utilized to generate an improved resolution image by deconstructing and then fusing information from the plurality of images and/or a plurality of frames of the captured video.
  • the pixel distance and/or physical distance can be determined based on a super-resolution restoration method (e.g., machine learning super-resolution) utilized to generate the third image utilizing the first image and the second image as input images for the super-resolution restoration method.
  • the super-resolution restoration method can be utilized for a particular pixel distance between images to allow the generated third image to include a relatively higher resolution than the first image and the second image.
  • the machine learning super-resolution method increases a quality of the cropped ROI portion of the image to a particular quality level.
  • the machine learning super-resolution method is utilized to remove distorted properties and/or artifacts from the cropped ROI portion. For example, when enlarging the ROI portion to the size of the original captured image size, the properties of the ROI portion of the image can be distorted.
  • the machine learning super-resolution method can utilize the original size of the ROI portion and the enlarged ROI portion to generate a higher quality version of the ROI portion at the enlarged size.
  • the computing device 102 includes instructions 116 stored by the memory resource 106 that is executed by the processor 104 to provide the cropped ROI portion to an application to be displayed.
  • the enlarged ROI portion with the increased quality from the machine learning super resolution method can be provided to the application that can be accessed by a remote device and/or a remote display.
  • the remote device may be able to access the relatively high-quality ROI portion and not have access to the original image captured by the imaging device 120.
  • the computing device 102 can utilize a driver device transform (DDT) to intercept image data captured by the imaging device 120.
  • the driver device transform can intercept the image data and alter the image data to include additional elements or altered elements such as including or cropping out an ROI portion.
  • a proxy camera or virtual camera can be utilized to intercept the image data and alter the image data to include the additional elements or altered elements.
  • the computing device 102 can intercept the video images or still images transmitted by the imaging device 120, alter the images, and then transmit the images to a display device or remote device. In this way, a user of the display device or remote device can view the altered images without viewing the original images captured by the imaging device 120.
  • the computing device 102 can include a processor 104 communicatively coupled to a memory resource 106 through a communication path.
  • the processor 104 can include, but is not limited to: a central processing unit (CPU), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a metal-programmable ceil array (MPCA), a semiconductor-based microprocessor, or other combination of circuitry and/or logic to orchestrate execution of instructions 108, 110, 112, 114, 116.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • MPCA metal-programmable ceil array
  • the computing device can include instructions 108, 110, 112, 114, 116, stored on a machine-readable medium (e.g., memory resource 106, non-transitory computer-readable medium, etc.) and executable by a processor 104.
  • a machine-readable medium e.g., memory resource 106, non-transitory computer-readable medium, etc.
  • the computing device utilizes a non-transitory computer-readable medium storing instructions 108, 110, 112, 114, 116, that, when executed, cause the processor 104 to perform corresponding functions.
  • Figure 2 illustrates an example of a memory resource 206 storing instructions for region of interest cropped images.
  • the memory resource 206 can be a part of a computing device or controller that can be communicatively coupled to a computing system.
  • the memory resource 206 can be part of a computing device 102 as referenced in Figure 1.
  • the memory resource 206 can be communicatively coupled to a processor 204 that can execute instructions 222, 224, 226, 228, 230, 232 stored on the memory resource 206.
  • the memory resource 206 can be communicatively coupled to the processor 204 through a communication path 218.
  • a communication path 218 can include a wired or wireless connection that can allow communication between devices and/or components within a single device.
  • the memory resource 206 may be electronic, magnetic, optical, or other physical storage device that stores executable instructions.
  • a non- transitory machine-readable medium (e.g., a memory resource 206) may be, for example, a non-transitory MRM comprising Random-Access Memory (RAM), read-only memory (ROM), an Electrically Erasable Programmable ROM (EEPROM), a storage drive, an optical disc, and the like.
  • the non-transitory machine-readable medium e.g., a memory resource 206) may be disposed within a controller and/or computing device, in this exampie, the executable instructions 222, 224, 226, 228, 230, 232, can be “installed” on the device.
  • the non- transitory machine-readable medium e.g., a memory resource
  • the non- transitory machine-readable medium can be a portable, external, or remote storage medium, for example, that allows a computing system to download the instructions 222, 224, 226, 228, 230, 232, from the portable/external/remote storage medium.
  • the executable instructions may be part of an “installation package”.
  • the memory resource 206 can include instructions 222 to interrupt a first image captured from an imaging device from being delivered to a destination device.
  • the first image includes a first size.
  • the first image captured includes a plurality of frames of a video and a second image is a portion of the plurality of frames of the video.
  • the first image captured is a portion of a video that is captured by a video imaging device.
  • the first image captured is a first size (e.g., 1920 x 1080, etc.) that is utilized for a particular application and/or display device.
  • the imaging device is capturing images that are to be utilized by a display device or application to allow remote devices to display the first image.
  • the memory resource 206 can include instructions 224 to identify a region of interest (ROI) within the first image based on a selected ROI method.
  • ROI includes a portion of the image that is defined by either a ROI method or selected area of the first image.
  • the ROI is based on the receiving application or receiving device.
  • the ROI can be based on a selected application.
  • the selected application can include a teleconference application that can correspond to the ROI of a human user or face of the human user. In this way, the ROI is identified by a boundary box or boundary area that defines the perimeter of the ROI.
  • the memory resource 206 can include instructions 226 to generate a second image based on the ROI.
  • the second image includes a second size.
  • the second size is smaller than the first size and the second image includes a lower resolution than the first image.
  • the second image is a portion of the first image that includes the ROI.
  • the second image is an image generated by cropping the ROI based on the boundaries of the ROI such that the remaining portion of the first image is the second image.
  • the memory resource 206 can Include instructions 228 to alter the second image to the first size. As described herein, the second image can be a portion of the first image that is smaller in size than the first image.
  • altering the size of the second image includes increasing the size of the second image to the size of the first image.
  • the first size of the first image is a size designated for a particular application and/or a particular receiving device. In this way, the size of the second image is increased to the first size such that the second image can be utilized by the destination application and/or destination device that was indicated when the imaging device captured the first image.
  • the memory resource 206 can include instructions 230 to apply machine learning super-resolution on the second image at the first size.
  • the machine learning super resolution includes one of an atrous spatial pyramid pooling (ASPP), residual in residual dense block (RRDB), and subpixel convolution.
  • the machine learning super-resolution applied to the image is utilized to increase an image quality of the second image.
  • the second image may have a relatively lower resolution since it is essentially a portion of the first image that has been increased in size to the size of the first image.
  • the ASPP is a semantic segmentation module for resampling a given feature layer at multiple rates prior to convolution. This amounts to probing the original image with multiple filters that have complementary effective fields of view, thus capturing objects as well as useful image context at multiple scales. Rather than actually resampling features, the mapping is implemented using multiple parallel atrous convolutional layers with different sampling rates. Although specific types of machine learning super-resolution methods are described, other types of machine learning super-resolution methods could be utilized.
  • the memory resource 206 can include instructions 232 to send the second image to the destination device.
  • the second image and/or altered second image with the first size and increased image quality can be provided to the destination device and/or destination application.
  • the destination device may not receive the first image and may receive the second image with the first size instead.
  • the second image includes the ROI with the first size that was intended for the destination device and with a relatively higher image quality without a user having to alter the settings of the imaging device.
  • Figure 3 illustrates an example of a computing device 302 for region of interest cropped images.
  • the computing device 302 includes a processor 304 communicatively coupled to a memory resource 306.
  • the computing device 302 includes a processor 304 and a memory resource 306 storing instructions 342, 344, 346, 348, 350, that are executed by the processor 304 to perform particular functions.
  • the computing device 302 is communicatively coupled to a video imaging device 320 to capture images of an area.
  • the computing device 302 includes a network device 352 that can be utilized to transmit images that are captured by the video imaging device 320 to a remote device.
  • the computing device 302 communicates with the video imaging device and/or the network device 352 through a communication path 318.
  • the computing device 302 provides instructions to the video imaging device 320 and/or the network device 352 to perform particular functions.
  • the video imaging device and/or the network device 352 provide data to the computing device 302 through the communication path 318.
  • the computing device 302 includes instructions 342 stored by the memory resource 306 that can be executed by the processor 304 to interrupt a video stream from being transferred by the network device 352 to a destination device. As described herein, the computing device 302 can receive the captured video stream from the video imaging device 320 through the communication path 318. In some examples, the computing device 302 includes instructions to intercept a plurality of frames that represent a video stream captured by the video imaging device 320 before the video stream is provided to a destination device or destination application. In this way, the computing device 302 is able to alter the received video stream before providing the video stream to the destination device or destination application.
  • the computing device 302 includes instructions 344 stored by the memory resource 306 that can be executed by the processor 304 to determine a region of interest (ROI) for the video stream based on a selected ROI method.
  • the ROI of the video stream is selected utilizing a particular ROI method.
  • the ROI method is a text recognition ROI method that identifies that text or images are within the plurality of frames of the video stream.
  • the video imaging device 320 captures a video stream that includes a white board or chalk board within an area.
  • the text recognition RO! method can identify that text is present within the video stream.
  • the white board or chalk board with text is selected as the ROI for the video stream.
  • the entire surface of the board is selected as the ROI and in other examples only a portion of the board with text or images is selected as the ROI. In this way, the images provided to a destination device focus on the text or images that are present on the board.
  • the ROI method is switched or selected by a human user.
  • the text recognition ROI method is selected for a first period of time and the ROI for the video stream includes the text on a white board or chalk board.
  • a human user recognition or facial recognition ROI method is selected or switched from the text recognition ROI method for a second period of time.
  • the ROI for the video stream is switched from the text or symbols of the board to a human user within the video stream. This can be helpful for lectures or other types of presentations where a focus of the presentation is the text for first period of time and the human user for a second period of time. In these examples, the focus of the provided video stream is altered without altering the settings of the video imaging device 320.
  • the determined ROI includes coordinates within the plurality of frames that create a boundary box to be utilized to crop the determined ROI from the plurality of frames.
  • the ROI that is selected based on the ROI method includes coordinates within the plurality of frames of the video stream that are utilized to crop or remove areas of the plurality of frames.
  • an area within the boundary box or boundary area is kept while an area outside the boundary box or boundary area is cropped or removed from the plurality of frames. In this way, a moving ROI is able to be focused on since each frame can be updated with corresponding coordinates such that the ROI remains as a focus point of the video stream.
  • the computing device 302 includes instructions 346 stored by the memory resource 306 that can be executed by the processor 304 to crop the determined ROI from a plurality of frames of the video stream.
  • cropping the determined ROI from the plurality of frames of the video stream includes removing areas not within the ROI portion of the plurality of frames.
  • each frame is analyzed by the ROI method to identify a corresponding ROI and coordinates within the frame to move with an ROI object or human user that moves during the video stream.
  • the computing device 302 includes instructions to alter a size of the cropped ROI to an original size of the video stream.
  • the video imaging device 320 can capture a video stream with a particular size based on the destination device or destination application. In this way, the cropped portion of the original video stream may be smaller than the original video stream and may be enlarged to be the particular size.
  • increasing the size of the ROI portion results in distortions or lowered image quality. For example, the enlarged ROI portion may appear blurry compared to the original video stream captured by the video imaging device 320.
  • the computing device 302 includes instructions 348 stored by the memory resource 306 that can be executed by the processor 304 to apply a convolutional neural network (CNN) super resolution (e.g., super resolution convolutional neural network (SR-CNN), etc.) on the cropped ROI from the plurality of frames
  • CNN convolutional neural network
  • SR-CNN super resolution convolutional neural network
  • the CNN-SR or SR-CNN is a model that can be utilized to increase an image quality of the cropped ROI that has been enlarged to the size of the original video stream.
  • the CNN-SR or SR-CNN model is capable of removing distorted features from the cropped ROI image based on a plurality of weights associated with the model.
  • the cropped ROI image and original video stream frames are utilized by the CNN-SR model to determine conversions that will increase the image quality of the cropped ROI.
  • the computing device 302 includes instructions 350 stored by the memory resource 306 that can be executed by the processor 304 to instruct the network device 352 to send the plurality of frames in response to the machine learning super resolution being applied to the cropped ROI. As described herein, the computing device 302 intercepts the original video stream from the video imaging device 320, alters the plurality of frames of the video stream based on the ROI, and then instructs the network device 352 to provide the altered plurality of frames to the destination device or destination application.
  • settings of the video imaging device 320 remain constant.
  • the image provided to the destination device and/or the destination application without having to alter settings of the video imaging device 320.
  • a zoom or direction of the video imaging device 320 remains constant while the image provided to the destination device is altered based on the ROi.
  • the provided image to the destination device or destination application is focused on the ROI without having a human user altering the digital settings (e.g., digital zoom, digital focus, etc.) or physical settings (e.g., lens zoom, angle, direction, etc.) of the video imaging device 320.
  • Figure 4 illustrates an example of method 460 for region of interest cropped images.
  • the method 460 is executed by a computing device or system.
  • the method 460 is executable by the system 340 as referenced in Figure 3.
  • the method 460 is utilized to alter image data captured by an imaging device before providing the altered image to a destination device or destination application.
  • the method 460 includes capturing an image by a webcam 462.
  • the webcam 462 includes an imaging device or video imaging device that is connected to a computing device through a communication path.
  • the webcam 462 is a device that captures a particular area and may be difficult to alter a position or zoom level during use.
  • the webcam 462 can be utilized with applications, such as teleconference applications.
  • the method 460 includes providing the raw full frame or original image captured by the webcam 462 to an ROI detection 464.
  • the ROI detection 464 includes instructions that are executed by a computing device or controller to identify an ROI within the raw full frame or original image captured by the webcam 462.
  • the ROI detection 464 can include a particular ROi selection method to automatically and dynamically select the ROI for a plurality of frames of the video stream.
  • the method 460 allows a plurality of user options 461 to be selected by a user to customize the ROI detection. For example, a particular ROI detection method can be selected for a particular time period and altered to a different ROI detection method for a different time period.
  • the method 460 includes providing the full raw frame and ROI coordinates to an ROI cropping alignment 466.
  • the ROI cropping alignment 466 is able to crop or remove portions of the full raw frame based on the ROI coordinates.
  • the ROI cropping alignment 466 can remove a portion of the fuil raw frame that is not within the coordinates of the ROI.
  • the ROi portion of the full raw frame is a size that is smaller than the full raw frame.
  • the ROI cropping alignment 466 aligns the ROI image such that text or images are aligned with a particular display of a receiving device.
  • the raw full image may capture text that is at an angle with respect to the webcam 462.
  • the ROI cropping alignment 466 may align the text to appear vertical when displayed and/or remove the angle of the captured image. This allows a user to more easily read the text within the ROI portion of the raw full image.
  • the method 460 includes performing superresolution 468 on the cropped ROI image received form the ROI cropping alignment 466.
  • the super-resolution 468 can include a machine learning method such as CNN (e.g., SR-CNN, etc.).
  • the super-resolution 468 is performed to increase a quality of the cropped ROI image
  • the super-resolution 468 can be utilized to increase a size of the cropped ROI image to a size of the raw full frame image or video stream.
  • the cropped ROI image can be provided to enhance optical character recognition (OCR) 470.
  • OCR can include instructions that can be utilized to recognize text within a digital image. Other types of enhancement applications or instructions can be utilized to increase the quality of the increased sized ROI image.
  • the OCR 470 is able to recognize the text or handwriting of a user during the raw full frame and generate closed captioning that can provide generated visible text of the handwriting or text that is within the ROI image, in this way, the ROI image can include the handwritten text from a user and also provide closed captioning of the handwritten text to the conference application 472.
  • the method 460 can provide the enhanced full frame and/or additional information related to the enhanced full frame to a conference application 472 or remote device.
  • the conference application 472 is one type of application that can be utilized with the method 460.
  • the conference application 472 can receive the enhanced full frame without receiving the raw full frame from the webcam 462.

Abstract

In some examples, a device includes an imaging device and a processor to: receive an image captured by the imaging device, identify a region of interest (ROI) within the captured image of an area based on a selected ROI method, crop the identified ROI portion from the captured image, apply machine learning super-resolution to the cropped ROI portion, and provide the cropped ROI portion to an application to be displayed.

Description

REGION OF INTEREST CROPPED IMAGES Background
[0001] A computing device allows a user to utilize computing device operations for work, education, gaming, multimedia, and/or other uses. Computing devices are utilized in a non-portable setting, such as at a desktop. In other examples, the computing device allows a user to carry or otherwise bring the computing device along while in a mobile setting. These computing devices can be connected to scanner devices, cameras, and/or other image capture devices to capture images of different areas or physical devices.
Brief Description of the Drawings
[0002] Figure 1 illustrates an example of a device for region of interest cropped images.
[0003] Figure 2 illustrates an example of a memory resource storing instructions for region of interest cropped images.
[0004] Figure 3 illustrates an example of a device for region of interest cropped images.
[0005] Figure 4 illustrates an example of method for region of interest cropped images.
Detailed Description
[0006] A user may utilize a computing device for various purposes, such as for business and/or recreational use. As used herein, the term “computing device” refers to an electronic system having a processor resource and a memory resource.
Examples of computing devices can include, for instance, a laptop computer, a notebook computer, a desktop computer, an all-in-one (AIO) computer, networking device (e.g., router, switch, etc.), and/or a mobile device (e.g., a smart phone, tablet, personal digital assistant, smart glasses, a wrist-worn device such as a smart watch, etc.), among other types of computing devices. As used herein, a mobile device refers to devices that are (or can be) carried and/or worn by a user. [0007] In some examples, computing devices are utilized as teleconference devices. As used herein, a teleconference device is utilized to provide audio and/or video data to remote computing devices. In this way, a teleconference device is a computing device that communicates with remote computing devices and allows remote users to communicate through audio and/or video data transferred between the plurality of computing devices.
[0008] In some examples, a plurality of computing devices can be utilized for a teleconference by connecting to a teleconference application. In some examples, the teleconference application includes instructions that are utilized to receive audio and/or video data from the plurality of computing devices and provide the audio and/or video data to each of the plurality of computing devices. In some examples, the teleconference application includes a teleconference portal that is utilized by a plurality of computing devices to exchange audio and/or video data. As used herein, a teleconference portal refers to a gateway for a website that provides teleconferencing functions.
[0009] In some examples, a controller can intercept images captured by an imaging device (e.g., camera, video camera, etc ). In some examples, the imaging device utilizes a field-programmable gate array (FGPA) machine learning hardware. In these examples, the FGPA machine learning hardware can be customized for a particular architecture (e.g., convolutional neural network, etc.) to perform the functions described herein. In these examples, the controller or computing device is able to intercept images captured by an imaging device or video imaging device. The intercepted images can be altered before being provided to an application (e.g., teleconference application, social media application, etc.) such that another user utilizing the application views the altered image and not the original image captured by the imaging device.
[0010] In some examples, an imaging device or video imaging device is adjustable to focus on a particular area or particular object within a viewable area. For example, the frame of view of an imaging device can be adjusted to capture a smaller area or a larger area. In some examples, the imaging device may be positioned at a particular location that may be adjusted to focus on a particular object or person and then be adjusted to focus on a different object or person. In these examples, a lens or digital setting may have to be adjusted to focus on the particular region of interest at a particular time. In some examples, the adjustment of the imaging device may take time and/or a region of interest may move within the viewable area, which may need additional adjusting to focus on the area of interest. [0011] The present disclosure relates to generating region of interest (ROI) cropped images. In some examples, the ROI is detected utilizing an ROI detection method. For example, the ROI includes a human user for a particular application or particular setting. In this example, the ROI detection method is a human user detection method that provides coordinates for a human user within the viewable area of the imaging device. In some examples, the coordinates are provided to the computing device to alter the image based on the provided coordinates. For example, the coordinates may provide a boundary box that surrounds a portion of the user (e.g., head and shoulders, head to waist, head to feet, etc.) and the computing device crops the image based on the coordinates.
[0012] In these examples, the cropped portion of a captured image may not have a relatively high quality or visual quality since the cropped portion could be a relatively small portion of the original size of the image. In these examples, the computing device increases the size of the cropped portion to the original size of the image and performs a machine learning super resolution method on the cropped portion to increase the quality of the image. Once the image has been altered to the higher quality cropped portion, the image is sent or provided to an application or remote device, in this way, the application may not receive the original image and only receive the altered cropped portion of the image from the computing device. This allows a user of the imaging device and/or computing device to provide high quality images of the ROI without having to alter the settings or lens of the imaging device.
[0013] Figure 1 illustrates an example of a computing device 102 for region of interest cropped images. In some examples, the computing device 102 includes a processor 104 and a memory resource 106 to store instructions that are executed by the processor 104. In some examples, the computing device 102 includes a processor 104 and a memory resource 106 storing instructions 108, 110, 112, 114, 116, that can be executed by the processor 104 to perform particular functions. In some examples, the computing device 102 is communicatively coupled to an imaging device 120 through a communication path 118. in some examples, the communication path 118 allows the computing device 102 to send and receive signals (e.g., communication signals, electrical signals, etc.) with the imaging device 120.
[0014] In some examples, the imaging device 120 is capable of capturing an image of an area. In some examples, the imaging device 120 is a web cam or video imaging device that is capable of capturing a plurality of image frames that generate a video. In some examples, the plurality of images are provided to an application and sent to a remote device or remote display device. In some examples, the images captured by the imaging device 120 are intercepted by the computing device 102 before being provided to the remoted device or remote display device. In this way, an altered image of a region of interest (ROI) is provided to the remote device without providing the original image captured by the imaging device 120.
[0015] The computing device 102 includes instructions 108 stored by the memory resource 106 that is executed by the processor 104 to receive an image captured by the imaging device 120. As described herein, the image captured by the imaging device 120 can be a still image, video image, infrared image, or other type of image that is provided through a communication path 118. In some examples, the imaging device 120 may be capturing video images of an area where objects or human user may be stationary or moving through the area. In some examples, the area captured by the imaging device 120 is a viewing area. In some examples, the captured image from the imaging device 120 is referred to as an original image or an image that has not been altered by the computing device 102.
[0016] In some examples, the captured image is a portion of frames from a plurality of frames captured by the imaging device 120. As described herein, the imaging device 120 can include a video imaging device that captures a plurality of frames or a video. In some examples, the captured image from the imaging device 120 includes a frame or a portion of the frames captured by the imaging device 120. In some examples, a portion of the plurality of frames are utilized to identify the ROI within the video or total plurality of frames.
[0017] The computing device 102 includes instructions 110 stored by the memory resource 106 that is executed by the processor 104 to identify a region of interest (ROI) within the captured image of an area based on a selected ROI method. In some examples, the ROI is a portion of the captured image that is identified as important or identified to be highlighted within the viewing area of the captured image. In some examples, the ROI is a portion of the image that is selected utilizing an ROi method. As described herein, an ROI method includes, but is not limited to: a human user identification method, a text identification method, an object identification method, among other methods for identifying what is a ROI within the captured image.
[0018] In some examples, a ROI method is selected by a user when utilizing the imaging device 120. In this way, the user may indicate how they intend to utilize the imaging device 120. For example, the user can select a ROI method associated with a teleconference. In this example, the ROI method is utilized to identify a human user within the viewing area and identifying a boundary box that surrounds the human user and/or a particular portion of the human user. In some examples, the ROI method can allow a boundary box to be manually selected by a user. In some examples, the boundary box that is identified by the ROI method or by the user can be utilized to determine coordinates within the image that define the boundary box. [0019] In some examples, the ROI method allows for multiple human users to be the ROI at different times during a video presentation. For example, the ROI method can be a gesture ROI method that allows a user to make gesture to be detected as the ROI for a time period. In some examples, the user will be the ROI of the video until a different user makes the gesture. In this example, the different user will be the ROI for a time period after making the gesture. In this way, a first user that makes the gesture can be the ROI for a first time period and the ROI can switch to a second user when the second user makes the gesture. As used herein, a gesture can include a hand gesture (e.g., wave, finger point, etc.) or other type of gesture to notify the gesture ROI method to select the user as the ROI.
[0020] In other examples, the ROI method includes a voice recognition ROI method. In this way, when a first user speaks, the voice recognition ROI method can identify that the first user is speaking and make the first user the ROI of the video image. In a similar way as the gesture ROI method, when a second user speaks, the voice recognition ROI method identifies the second user and switches the ROi from the first user to the second user.
[0021] In some examples, the computing device 102 includes instructions to determine coordinates of the ROI based on the selected ROI method. In some examples, the computing device 102 includes instructions to crop the identified ROi portion based on coordinates within the captured image identified by the selected ROI method. For example, the determined coordinates can be a border that surrounds the ROI that is to be utilized for providing to a remote device and/or an application that is being provided with an image captured by the imaging device 120. [0022] The computing device 102 includes instructions 112 stored by the memory resource 106 that is executed by the processor 104 to crop the identified ROI portion from the captured image. As described herein, the computing device 102 crops or removes a portion of the original image captured by the imaging device 120 based on the coordinates of the boundary box associated with the ROI. In some examples, the computing device 102 removes a portion of the original image from an edge of the original image to the coordinates of the boundary box of the ROI. For example, the coordinates of the ROI can identify a box or shape that encloses or surrounds the ROI. In these examples, the computing device 102 remove the portions of the original image that are not within the boundary box or shape that encloses the ROI.
[0023] In some examples, the ROI method includes one of: a face recognition method, a human user detection method, and text recognition method. In some examples, the face recognition ROI identifies a face or identifies an identity of a human user within the original image captured by the imaging device 120. In these examples, the face recognition ROI method identifies an area that surrounds a face or portion of a user that includes the face of the user. In some examples, the face recognition ROI method is utilized to identify a face of the user and generates a boundary box that includes the face of the user and a portion of the user’s body. In some examples, the portion of the user’s body is defined by the computing device 102 and adjusted based on the location of the face of the user.
[0024] In some examples, the human user detection ROI method includes a method for identifying whether a human user is within the original image captured by the imaging device 120. In some examples, the human user detection ROI method utilizes the identified human user to generate a boundary box that includes coordinates for cropping the original image based on the location of the human user within the original image. In some examples, the human user detection ROI method identifies a boundary around the human user or a portion of the human user based on an identified portion of the human user (e.g., face, head, center, etc.).
[0025] In some examples, the text detection ROI method includes a method for identifying text within the original image captured by the imaging device 120. In some examples, the text detection ROI method identifies text on a piece of paper or other media and defines a boundary box or perimeter that surrounds the identified text within the original image.
[0026] In some examples, the computing device 102 includes instructions to receive a boundary box applied to the captured image and identify an area within the boundary box as the ROI. As described herein, the ROI portion of the image is defined by a boundary box or perimeter shape that encloses the ROI portion of the image. In this way, the portions that are not part of the ROI portion of the image are removed or cropped from the image when the image is cropped utilizing the boundary box.
[0027] In some examples, the computing device 102 includes instructions to alter a size of the cropped ROI portion to an original size of the captured image. In some examples, the cropped ROI portion of the image is smaller than the size of the original image. In some examples, the original image is generated to be utilized by an application. For this reason, the computing device 102 may alter the size of the cropped ROI portion to match the size of the original image such that the cropped ROI portion can be utilized by the same application. However, as described herein, increasing the size of the ROI portion may degrade the image quality of the ROI portion. In some examples, increasing the size of an image may increase a blurriness or distortion of the image.
[0028] The computing device 102 includes instructions 114 stored by the memory resource 106 that is executed by the processor 104 to apply machine learning super-resolution to the cropped ROI portion, in some examples, the machine learning super-resolution method can refer to a class of techniques that can enhance (e.g., increase) a resolution of a subject utilizing a plurality of images of the subject. In an optical super-resolution restoration method, the diffraction limit of systems can be transcended, while in geometrical super-resolution restoration the resolution of a digital imaging sensor can be enhanced. In some examples, the machine learning super-resolution method can include a multiple-frame superresolution restoration method. In a multiple-frame super-resolution restoration, subpixel shifts between a plurality of images of the same scene or subject can be utilized. In some examples, the plurality of images can be utilized to generate an improved resolution image by deconstructing and then fusing information from the plurality of images and/or a plurality of frames of the captured video. [0029] In some examples, the pixel distance and/or physical distance can be determined based on a super-resolution restoration method (e.g., machine learning super-resolution) utilized to generate the third image utilizing the first image and the second image as input images for the super-resolution restoration method. For example, the super-resolution restoration method can be utilized for a particular pixel distance between images to allow the generated third image to include a relatively higher resolution than the first image and the second image.
[0030] In some examples, the machine learning super-resolution method increases a quality of the cropped ROI portion of the image to a particular quality level. In some examples, the machine learning super-resolution method is utilized to remove distorted properties and/or artifacts from the cropped ROI portion. For example, when enlarging the ROI portion to the size of the original captured image size, the properties of the ROI portion of the image can be distorted. In some examples, the machine learning super-resolution method can utilize the original size of the ROI portion and the enlarged ROI portion to generate a higher quality version of the ROI portion at the enlarged size.
[0031] The computing device 102 includes instructions 116 stored by the memory resource 106 that is executed by the processor 104 to provide the cropped ROI portion to an application to be displayed. As described herein, the enlarged ROI portion with the increased quality from the machine learning super resolution method can be provided to the application that can be accessed by a remote device and/or a remote display. In this way, the remote device may be able to access the relatively high-quality ROI portion and not have access to the original image captured by the imaging device 120.
[0032] In some examples, the computing device 102 can utilize a driver device transform (DDT) to intercept image data captured by the imaging device 120. In some examples, the driver device transform can intercept the image data and alter the image data to include additional elements or altered elements such as including or cropping out an ROI portion. In other examples, a proxy camera or virtual camera can be utilized to intercept the image data and alter the image data to include the additional elements or altered elements. In some examples, the computing device 102 can intercept the video images or still images transmitted by the imaging device 120, alter the images, and then transmit the images to a display device or remote device. In this way, a user of the display device or remote device can view the altered images without viewing the original images captured by the imaging device 120.
[0033] As described herein, the computing device 102 can include a processor 104 communicatively coupled to a memory resource 106 through a communication path. As used herein, the processor 104 can include, but is not limited to: a central processing unit (CPU), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a metal-programmable ceil array (MPCA), a semiconductor-based microprocessor, or other combination of circuitry and/or logic to orchestrate execution of instructions 108, 110, 112, 114, 116. In other examples, the computing device can include instructions 108, 110, 112, 114, 116, stored on a machine-readable medium (e.g., memory resource 106, non-transitory computer-readable medium, etc.) and executable by a processor 104. In a specific example, the computing device utilizes a non-transitory computer-readable medium storing instructions 108, 110, 112, 114, 116, that, when executed, cause the processor 104 to perform corresponding functions.
[0034] Figure 2 illustrates an example of a memory resource 206 storing instructions for region of interest cropped images. In some examples, the memory resource 206 can be a part of a computing device or controller that can be communicatively coupled to a computing system. For example, the memory resource 206 can be part of a computing device 102 as referenced in Figure 1. In some examples, the memory resource 206 can be communicatively coupled to a processor 204 that can execute instructions 222, 224, 226, 228, 230, 232 stored on the memory resource 206. For example, the memory resource 206 can be communicatively coupled to the processor 204 through a communication path 218. in some examples, a communication path 218 can include a wired or wireless connection that can allow communication between devices and/or components within a single device.
[0035] The memory resource 206 may be electronic, magnetic, optical, or other physical storage device that stores executable instructions. Thus, a non- transitory machine-readable medium (MRM) (e.g., a memory resource 206) may be, for example, a non-transitory MRM comprising Random-Access Memory (RAM), read-only memory (ROM), an Electrically Erasable Programmable ROM (EEPROM), a storage drive, an optical disc, and the like. The non-transitory machine-readable medium (e.g., a memory resource 206) may be disposed within a controller and/or computing device, in this exampie, the executable instructions 222, 224, 226, 228, 230, 232, can be “installed” on the device. Additionally, and/or alternatively, the non- transitory machine-readable medium (e.g., a memory resource) can be a portable, external, or remote storage medium, for example, that allows a computing system to download the instructions 222, 224, 226, 228, 230, 232, from the portable/external/remote storage medium. In this situation, the executable instructions may be part of an “installation package”.
[0036] In some examples, the memory resource 206 can include instructions 222 to interrupt a first image captured from an imaging device from being delivered to a destination device. In these examples, the first image includes a first size. In some examples, the first image captured includes a plurality of frames of a video and a second image is a portion of the plurality of frames of the video. In some examples, the first image captured is a portion of a video that is captured by a video imaging device. As described herein, the first image captured is a first size (e.g., 1920 x 1080, etc.) that is utilized for a particular application and/or display device. In this way, the imaging device is capturing images that are to be utilized by a display device or application to allow remote devices to display the first image.
[0037] In some examples, the memory resource 206 can include instructions 224 to identify a region of interest (ROI) within the first image based on a selected ROI method. As described herein, the ROI includes a portion of the image that is defined by either a ROI method or selected area of the first image. In some examples, the ROI is based on the receiving application or receiving device. For example, the ROI can be based on a selected application. In some examples, the selected application can include a teleconference application that can correspond to the ROI of a human user or face of the human user. In this way, the ROI is identified by a boundary box or boundary area that defines the perimeter of the ROI.
[0038] In some examples, the memory resource 206 can include instructions 226 to generate a second image based on the ROI. In these examples, the second image includes a second size. In some examples, the second size is smaller than the first size and the second image includes a lower resolution than the first image. In some examples, the second image is a portion of the first image that includes the ROI. In some examples, the second image is an image generated by cropping the ROI based on the boundaries of the ROI such that the remaining portion of the first image is the second image. [0039] In some examples, the memory resource 206 can Include instructions 228 to alter the second image to the first size. As described herein, the second image can be a portion of the first image that is smaller in size than the first image. In some examples, altering the size of the second image includes increasing the size of the second image to the size of the first image. In some examples, the first size of the first image is a size designated for a particular application and/or a particular receiving device. In this way, the size of the second image is increased to the first size such that the second image can be utilized by the destination application and/or destination device that was indicated when the imaging device captured the first image.
[0040] In some examples, the memory resource 206 can include instructions 230 to apply machine learning super-resolution on the second image at the first size. In some examples, the machine learning super resolution includes one of an atrous spatial pyramid pooling (ASPP), residual in residual dense block (RRDB), and subpixel convolution. As described herein, the machine learning super-resolution applied to the image is utilized to increase an image quality of the second image. As described herein, the second image may have a relatively lower resolution since it is essentially a portion of the first image that has been increased in size to the size of the first image.
[0041] In some examples, the ASPP is a semantic segmentation module for resampling a given feature layer at multiple rates prior to convolution. This amounts to probing the original image with multiple filters that have complementary effective fields of view, thus capturing objects as well as useful image context at multiple scales. Rather than actually resampling features, the mapping is implemented using multiple parallel atrous convolutional layers with different sampling rates. Although specific types of machine learning super-resolution methods are described, other types of machine learning super-resolution methods could be utilized.
[0042] In some examples, the memory resource 206 can include instructions 232 to send the second image to the destination device. As described herein, the second image and/or altered second image with the first size and increased image quality can be provided to the destination device and/or destination application. In this way, the destination device may not receive the first image and may receive the second image with the first size instead. In this way, the second image includes the ROI with the first size that was intended for the destination device and with a relatively higher image quality without a user having to alter the settings of the imaging device.
[0043] Figure 3 illustrates an example of a computing device 302 for region of interest cropped images. In some examples, the computing device 302 includes a processor 304 communicatively coupled to a memory resource 306. In some examples, the computing device 302 includes a processor 304 and a memory resource 306 storing instructions 342, 344, 346, 348, 350, that are executed by the processor 304 to perform particular functions.
[0044] In some examples, the computing device 302 is communicatively coupled to a video imaging device 320 to capture images of an area. In some examples, the computing device 302 includes a network device 352 that can be utilized to transmit images that are captured by the video imaging device 320 to a remote device. The computing device 302 communicates with the video imaging device and/or the network device 352 through a communication path 318. In some examples, the computing device 302 provides instructions to the video imaging device 320 and/or the network device 352 to perform particular functions. In other examples, the video imaging device and/or the network device 352 provide data to the computing device 302 through the communication path 318.
[0045] The computing device 302 includes instructions 342 stored by the memory resource 306 that can be executed by the processor 304 to interrupt a video stream from being transferred by the network device 352 to a destination device. As described herein, the computing device 302 can receive the captured video stream from the video imaging device 320 through the communication path 318. In some examples, the computing device 302 includes instructions to intercept a plurality of frames that represent a video stream captured by the video imaging device 320 before the video stream is provided to a destination device or destination application. In this way, the computing device 302 is able to alter the received video stream before providing the video stream to the destination device or destination application. [0046] The computing device 302 includes instructions 344 stored by the memory resource 306 that can be executed by the processor 304 to determine a region of interest (ROI) for the video stream based on a selected ROI method. As described herein, the ROI of the video stream is selected utilizing a particular ROI method. In some examples, the ROI method is a text recognition ROI method that identifies that text or images are within the plurality of frames of the video stream. For example, the video imaging device 320 captures a video stream that includes a white board or chalk board within an area. When a human user begins to write images or text on the board, the text recognition RO! method can identify that text is present within the video stream. In this example, the white board or chalk board with text is selected as the ROI for the video stream. In some examples, the entire surface of the board is selected as the ROI and in other examples only a portion of the board with text or images is selected as the ROI. In this way, the images provided to a destination device focus on the text or images that are present on the board.
[0047] In some examples, the ROI method is switched or selected by a human user. For example, the text recognition ROI method is selected for a first period of time and the ROI for the video stream includes the text on a white board or chalk board. In this example, a human user recognition or facial recognition ROI method is selected or switched from the text recognition ROI method for a second period of time. In this way, the ROI for the video stream is switched from the text or symbols of the board to a human user within the video stream. This can be helpful for lectures or other types of presentations where a focus of the presentation is the text for first period of time and the human user for a second period of time. In these examples, the focus of the provided video stream is altered without altering the settings of the video imaging device 320.
[0048] In some examples, the determined ROI includes coordinates within the plurality of frames that create a boundary box to be utilized to crop the determined ROI from the plurality of frames. As described herein, the ROI that is selected based on the ROI method includes coordinates within the plurality of frames of the video stream that are utilized to crop or remove areas of the plurality of frames. In some examples, an area within the boundary box or boundary area is kept while an area outside the boundary box or boundary area is cropped or removed from the plurality of frames. In this way, a moving ROI is able to be focused on since each frame can be updated with corresponding coordinates such that the ROI remains as a focus point of the video stream.
[0049] The computing device 302 includes instructions 346 stored by the memory resource 306 that can be executed by the processor 304 to crop the determined ROI from a plurality of frames of the video stream. As described herein, cropping the determined ROI from the plurality of frames of the video stream includes removing areas not within the ROI portion of the plurality of frames. In these examples, each frame is analyzed by the ROI method to identify a corresponding ROI and coordinates within the frame to move with an ROI object or human user that moves during the video stream.
[0050] In some examples, the computing device 302 includes instructions to alter a size of the cropped ROI to an original size of the video stream. As described herein, the video imaging device 320 can capture a video stream with a particular size based on the destination device or destination application. In this way, the cropped portion of the original video stream may be smaller than the original video stream and may be enlarged to be the particular size. In some examples, increasing the size of the ROI portion results in distortions or lowered image quality. For example, the enlarged ROI portion may appear blurry compared to the original video stream captured by the video imaging device 320.
[0051] The computing device 302 includes instructions 348 stored by the memory resource 306 that can be executed by the processor 304 to apply a convolutional neural network (CNN) super resolution (e.g., super resolution convolutional neural network (SR-CNN), etc.) on the cropped ROI from the plurality of frames, in some examples, the CNN-SR or SR-CNN is a model that can be utilized to increase an image quality of the cropped ROI that has been enlarged to the size of the original video stream. For example, the CNN-SR or SR-CNN model is capable of removing distorted features from the cropped ROI image based on a plurality of weights associated with the model. In some examples the cropped ROI image and original video stream frames are utilized by the CNN-SR model to determine conversions that will increase the image quality of the cropped ROI.
[0052] The computing device 302 includes instructions 350 stored by the memory resource 306 that can be executed by the processor 304 to instruct the network device 352 to send the plurality of frames in response to the machine learning super resolution being applied to the cropped ROI. As described herein, the computing device 302 intercepts the original video stream from the video imaging device 320, alters the plurality of frames of the video stream based on the ROI, and then instructs the network device 352 to provide the altered plurality of frames to the destination device or destination application.
[0053] In some examples, settings of the video imaging device 320 remain constant. As described herein, the image provided to the destination device and/or the destination application without having to alter settings of the video imaging device 320. For example, a zoom or direction of the video imaging device 320 remains constant while the image provided to the destination device is altered based on the ROi. In this way, the provided image to the destination device or destination application is focused on the ROI without having a human user altering the digital settings (e.g., digital zoom, digital focus, etc.) or physical settings (e.g., lens zoom, angle, direction, etc.) of the video imaging device 320.
[0054] Figure 4 illustrates an example of method 460 for region of interest cropped images. In some examples, the method 460 is executed by a computing device or system. For example, the method 460 is executable by the system 340 as referenced in Figure 3. In some examples, the method 460 is utilized to alter image data captured by an imaging device before providing the altered image to a destination device or destination application.
[0055] In some examples, the method 460 includes capturing an image by a webcam 462. The webcam 462 includes an imaging device or video imaging device that is connected to a computing device through a communication path. In some examples, the webcam 462 is a device that captures a particular area and may be difficult to alter a position or zoom level during use. In other examples, the webcam 462 can be utilized with applications, such as teleconference applications.
[0056] In some examples, the method 460 includes providing the raw full frame or original image captured by the webcam 462 to an ROI detection 464. In some examples, the ROI detection 464 includes instructions that are executed by a computing device or controller to identify an ROI within the raw full frame or original image captured by the webcam 462. As described herein, the ROI detection 464 can include a particular ROi selection method to automatically and dynamically select the ROI for a plurality of frames of the video stream. In some examples, the method 460 allows a plurality of user options 461 to be selected by a user to customize the ROI detection. For example, a particular ROI detection method can be selected for a particular time period and altered to a different ROI detection method for a different time period.
[0057] In some examples, the method 460 includes providing the full raw frame and ROI coordinates to an ROI cropping alignment 466. The ROI cropping alignment 466 is able to crop or remove portions of the full raw frame based on the ROI coordinates. As described herein, the ROI cropping alignment 466 can remove a portion of the fuil raw frame that is not within the coordinates of the ROI. As described herein, the ROi portion of the full raw frame is a size that is smaller than the full raw frame. In some examples, the ROI cropping alignment 466 aligns the ROI image such that text or images are aligned with a particular display of a receiving device. For example, the raw full image may capture text that is at an angle with respect to the webcam 462. In this example, the ROI cropping alignment 466 may align the text to appear vertical when displayed and/or remove the angle of the captured image. This allows a user to more easily read the text within the ROI portion of the raw full image.
[0058] In some examples, the method 460 includes performing superresolution 468 on the cropped ROI image received form the ROI cropping alignment 466. As described herein, the super-resolution 468 can include a machine learning method such as CNN (e.g., SR-CNN, etc.). As described herein, the super-resolution 468 is performed to increase a quality of the cropped ROI image, in other examples, the super-resolution 468 can be utilized to increase a size of the cropped ROI image to a size of the raw full frame image or video stream. In this way, the cropped ROI image can be provided to enhance optical character recognition (OCR) 470. As used herein, OCR can include instructions that can be utilized to recognize text within a digital image. Other types of enhancement applications or instructions can be utilized to increase the quality of the increased sized ROI image.
[0059] In some examples, the OCR 470 is able to recognize the text or handwriting of a user during the raw full frame and generate closed captioning that can provide generated visible text of the handwriting or text that is within the ROI image, in this way, the ROI image can include the handwritten text from a user and also provide closed captioning of the handwritten text to the conference application 472.
[0060] As described herein, the method 460 can provide the enhanced full frame and/or additional information related to the enhanced full frame to a conference application 472 or remote device. In some examples, the conference application 472 is one type of application that can be utilized with the method 460. In some examples, the conference application 472 can receive the enhanced full frame without receiving the raw full frame from the webcam 462.
[0061] In the foregoing detailed description of the disclosure, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration how examples of the disclosure may be practiced. These examples are described in sufficient detail to enable those of ordinary skill in the art to practice the examples of this disclosure, and it is to be understood that other examples may be utilized and that process, electrical, and/or structural changes may be made without departing from the scope of the disclosure. Further, as used herein, “a” refers to one such thing or more than one such thing.
[0062] The figures herein follow a numbering convention in which the first digit corresponds to the drawing figure number and the remaining digits identify an element or component in the drawing. For example, reference numeral 102 may refer to element 102 in Figure 1 and an analogous element may be identified by reference numeral 302 in Figure 3. Elements shown in the various figures herein can be added, exchanged, and/or eliminated to provide additional examples of the disclosure. In addition, the proportion and the relative scale of the elements provided in the figures are intended to illustrate the examples of the disclosure and should not be taken in a limiting sense.
[0063] It can be understood that when an element is referred to as being "on," "connected to", “coupled to”, or "coupled with" another element, it can be directly on, connected, or coupled with the other element or intervening elements may be present. In contrast, when an object is “directly coupled to” or “directly coupled with” another element it is understood that are no intervening elements (adhesives, screws, other elements) etc.
[0064] The above specification, examples, and data provide a description of the system and method of the disclosure. Since many examples can be made without departing from the spirit and scope of the system and method of the disclosure, this specification merely sets forth some of the many possible example configurations and implementations.

Claims

What is claimed is:
1. A device, comprising: an imaging device; and a processor to: receive an image captured by the imaging device; identify a region of interest (ROI) within the captured image of an area based on a selected ROI method; crop the identified ROI portion from the captured image; apply machine learning super-resolution to the cropped ROI portion; and provide the cropped ROI portion to an application to be displayed.
2. The device of claim 1 , wherein the processor is to alter a size of the cropped ROI portion to an original size of the captured image.
3. The device of claim 1 , wherein the processor is to crop the identified ROI portion based on coordinates within the captured image identified by the selected ROI method.
4. The device of claim 1, wherein the ROI method includes one of: a face recognition method, a human user detection method, and text recognition method.
5. The device of claim 1, wherein the processor is to receive a boundary box applied to the captured image and identify an area within the boundary box as the ROI.
6. The device of claim 1 , wherein the processor is to determine coordinates of the ROI based on the selected ROI method.
7. The device of claim 1 , wherein the captured image is a portion of frames from a plurality of frames captured by the imaging device.
8. A non-transitory memory resource storing machine-readable instructions stored thereon that, when executed, cause a processor of a computing device to: interrupt a first image captured from an imaging device from being delivered to a destination device, wherein the first image includes a first size; identify a region of interest (ROI) within the first image based on a selected ROI method; generate a second image based on the ROI, wherein the second image includes a second size; alter the second image to the first size; apply machine learning super resolution on the second image at the first size; and send the second image to the destination device.
9. The memory resource of claim 8, wherein the second size is smaller than the first size and the second image includes a lower resolution than the first image.
10. The memory resource of claim 8, wherein the machine learning super resolution includes one of an atrous spatial pyramid pooling (ASPP), residual in residual dense block (RRDB), and subpixel convolution.
11. The memory resource of claim 8, wherein the first image captured includes a plurality of frames of a video and the second image is a portion of the plurality of frames of the video.
12. A device, comprising: a video imaging device; a network device; and a processor to: interrupt a video stream from being transferred by the network device to a destination device; determine a region of interest (ROI) for the video stream based on a selected ROI method; crop the determined ROi from a plurality of frames of the video stream; apply a convolutional neural network (CNN) super resolution on the cropped ROI from the plurality of frames; and instruct the network device to send the plurality of frames in response to the CNN super resolution being applied to the cropped ROI.
13. The device of claim 12, wherein the determined ROI includes coordinates within the plurality of frames that create a boundary box to be utilized to crop the determined ROI from the plurality of frames.
14. The device of claim 12, wherein the processor is to alter a size of the cropped ROI to an original size of the video stream.
15. The device of claim 12, wherein settings of the video imaging device remain constant.
PCT/US2021/054736 2021-10-13 2021-10-13 Region of interest cropped images WO2023063940A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2021/054736 WO2023063940A1 (en) 2021-10-13 2021-10-13 Region of interest cropped images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2021/054736 WO2023063940A1 (en) 2021-10-13 2021-10-13 Region of interest cropped images

Publications (1)

Publication Number Publication Date
WO2023063940A1 true WO2023063940A1 (en) 2023-04-20

Family

ID=85988794

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/054736 WO2023063940A1 (en) 2021-10-13 2021-10-13 Region of interest cropped images

Country Status (1)

Country Link
WO (1) WO2023063940A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200105500A1 (en) * 2018-09-28 2020-04-02 Taiwan Semiconductor Manufacturing Co., Ltd. Machine learning on wafer defect review
US20200272825A1 (en) * 2019-05-27 2020-08-27 Beijing Dajia Internet Information Technology Co., Ltd. Scene segmentation method and device, and storage medium
US20200323480A1 (en) * 2017-11-27 2020-10-15 Retispec Inc. Hyperspectral Image-Guided Raman Ocular Imager for Alzheimer's Disease Pathologies
WO2021148844A1 (en) * 2020-01-23 2021-07-29 Four Ace Ltd. Biometric method and system for hand analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200323480A1 (en) * 2017-11-27 2020-10-15 Retispec Inc. Hyperspectral Image-Guided Raman Ocular Imager for Alzheimer's Disease Pathologies
US20200105500A1 (en) * 2018-09-28 2020-04-02 Taiwan Semiconductor Manufacturing Co., Ltd. Machine learning on wafer defect review
US20200272825A1 (en) * 2019-05-27 2020-08-27 Beijing Dajia Internet Information Technology Co., Ltd. Scene segmentation method and device, and storage medium
WO2021148844A1 (en) * 2020-01-23 2021-07-29 Four Ace Ltd. Biometric method and system for hand analysis

Similar Documents

Publication Publication Date Title
CN108391035B (en) Shooting method, device and equipment
CN107111863B (en) Apparatus and method for corneal imaging
JP5222939B2 (en) Simulate shallow depth of field to maximize privacy in videophones
US8749607B2 (en) Face equalization in video conferencing
JP6330036B2 (en) Image processing apparatus and image display apparatus
US9142010B2 (en) Image enhancement based on combining images from multiple cameras
US20130169760A1 (en) Image Enhancement Methods And Systems
KR101642402B1 (en) Apparatus and method for capturing digital image for guiding photo composition
CN108259759A (en) focusing method, device and storage medium
US20200304713A1 (en) Intelligent Video Presentation System
EP4156082A1 (en) Image transformation method and apparatus
CN107977636B (en) Face detection method and device, terminal and storage medium
CN112085775A (en) Image processing method, device, terminal and storage medium
CN110807769B (en) Image display control method and device
US20160350622A1 (en) Augmented reality and object recognition device
CN109726613B (en) Method and device for detection
KR20230017774A (en) Information processing device, information processing method, and program
WO2023063940A1 (en) Region of interest cropped images
WO2013112295A1 (en) Image enhancement based on combining images from multiple cameras
CN115428009B (en) Content-based image processing
CN112291445B (en) Image processing method, device, equipment and storage medium
JP6539624B2 (en) Gaze-matched face image synthesizing method, video conference system, and program
TWI395469B (en) Method for image pickup and image pickup device using the same
JP2019121150A (en) Image processor
CN111553865B (en) Image restoration method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21960793

Country of ref document: EP

Kind code of ref document: A1