US20240013536A1 - Compound images - Google Patents
Compound images Download PDFInfo
- Publication number
- US20240013536A1 US20240013536A1 US17/861,713 US202217861713A US2024013536A1 US 20240013536 A1 US20240013536 A1 US 20240013536A1 US 202217861713 A US202217861713 A US 202217861713A US 2024013536 A1 US2024013536 A1 US 2024013536A1
- Authority
- US
- United States
- Prior art keywords
- region
- person
- source image
- processor
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 150000001875 compounds Chemical class 0.000 title claims abstract description 85
- 230000004044 response Effects 0.000 claims abstract description 12
- 238000004891 communication Methods 0.000 claims description 21
- 238000001514 detection method Methods 0.000 claims description 12
- 238000005192 partition Methods 0.000 claims description 6
- 238000000034 method Methods 0.000 description 20
- 238000010801 machine learning Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 12
- 238000012549 training Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000012913 prioritisation Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000001429 visible spectrum Methods 0.000 description 1
- PICXIOQBANWBIZ-UHFFFAOYSA-N zinc;1-oxidopyridine-2-thione Chemical class [Zn+2].[O-]N1C=CC=CC1=S.[O-]N1C=CC=CC1=S PICXIOQBANWBIZ-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/24—Indexing scheme for image data processing or generation, in general involving graphical user interfaces [GUIs]
Definitions
- Electronic technology has advanced to become virtually ubiquitous in society and has been used for many activities in society.
- electronic devices are used to perform a variety of tasks, including work activities, communication, research, and entertainment.
- computers may be used to participate in virtual meetings over the Internet.
- FIG. 1 is a diagram illustrating an example of a source image that may be utilized to generate a compound image in accordance with some examples of the techniques described herein;
- FIG. 2 is a diagram illustrating an example of a compound image in accordance with some examples of the techniques described herein;
- FIG. 3 is a block diagram illustrating an example of an electronic device that may be used to generate a compound image
- FIG. 4 is a block diagram illustrating an example of an apparatus for compound image generation
- FIG. 5 is a flow diagram illustrating an example of a method for macro view generation
- FIG. 6 is a block diagram illustrating an example of a computer-readable medium for compound image generation.
- People in a video meeting may be displayed with different sizes due to the people being situated at different distances from a camera (in a meeting room, for instance). It may be difficult for a participant that is unfamiliar with other attendees to identify who is speaking.
- Some examples of the techniques described herein provide a combination of a wide angle view with an individual view of a person. For instance, showing an individual view may allow a remote participant to more clearly see a person that is sitting relatively farther from the camera (e.g., a person that may appear small in the wide angle view).
- multiple individual views are generated.
- the individual views may be ordered (e.g., initially ordered) in accordance with an arrangement of people in the video meeting. In some examples, the order may be adjusted over time, which may indicate a speaking sequence.
- FIG. 1 is a diagram illustrating an example of a source image 160 that may be utilized to generate a compound image in accordance with some examples of the techniques described herein.
- FIG. 2 is a diagram illustrating an example of a compound image 262 in accordance with some examples of the techniques described herein. FIG. 1 and FIG. 2 are described together.
- a source image is a digital image captured by an image sensor (e.g., digital camera, web cam, etc.).
- a source image depicts an environment (e.g., conference room, meeting room, office, family room, etc.) in which people (e.g., meeting participants, attendees, etc.) may be situated.
- a source image may have a field of view that is greater than sixty degrees along a horizontal dimension.
- a source image may be captured using an image sensor with a wide-angle lens, a fisheye lens, etc.
- a source image may be captured by an image sensor (e.g., digital camera, web cam, etc.) and provided to an apparatus (e.g., electronic device, computing device, server, etc.).
- an apparatus may include an image sensor, may be coupled to an image sensor, may be in communication with an image sensor, or a combination thereof.
- the source image 160 depicts seven people situated around a conference table.
- the apparatus may determine a region that depicts a person.
- the apparatus may include a processor to execute a machine learning model to detect a person (e.g., face, head and shoulders, etc.).
- Machine learning is a technique where a machine learning model (e.g., artificial neural network (ANN), convolutional neural network (CNN), etc.) is trained to perform a task based on a set of examples (e.g., data).
- Training a machine learning model may include determining weights corresponding to structures of the machine learning model.
- artificial neural networks may be a kind of machine learning model that may be structured with nodes, layers, connections, or a combination thereof.
- a machine learning model may be trained with a set of training images.
- a set of training images may include images of an object(s) for detection (e.g., images of a user, people, etc.).
- the set of training images may be labeled with the class of object(s), location (e.g., region, bounding box, etc.) of object(s) in the images, or a combination thereof.
- the machine learning model may be trained to detect the object(s) by iteratively adjusting weights of the model(s) and evaluating a loss function(s).
- the trained machine learning model may be executed to detect the object(s) (with a degree of probability, for instance).
- the source image 160 may be utilized with computer vision techniques to detect an object(s) (e.g., a user, people, etc.).
- an apparatus uses machine learning, a computer vision technique(s), or a combination thereof to detect a person or people. For instance, an apparatus may detect a location of a person (e.g., face) in a source image and provide a region that includes (e.g., depicts) the person. For instance, the apparatus may produce a region (e.g., bounding box) around a detected face. In some examples, the region may be sized based on the size of the detected person in the image (e.g., sized proportionate to the size of the detected face, sized with a margin from a face edge or detected feature, such as eyes, etc.). In the example of FIG.
- first region 170 depicts a first person
- second region 172 depicts a second person
- third region 174 depicts a third person
- fourth region 176 depicts a fourth person
- the apparatus may determine a top coordinate of a top region.
- a top region is a region that is nearest to the top (e.g., 0 in a height dimension or y dimension) of a source image.
- the apparatus may determine a region with a height coordinate that is nearest to the top of a source image (e.g., the smallest height coordinate or y coordinate).
- the top region is the first region 170 and the top coordinate 184 is the height coordinate of the top of the first region 170 .
- coordinates or values of an image may be expressed as increasing from left to right for a horizontal (e.g., width) dimension and increasing from top to bottom for a vertical (e.g., height) dimension.
- the source image 160 may have dimensions of 1920 ⁇ 1080, where an upper left corner of the source image 160 may have coordinates of (0, 0). Other coordinates, ordering (e.g., values increasing from bottom to top of an image, etc.), indexing, or a combination thereof may be utilized in some examples of the techniques described herein. Some examples of the techniques described herein may be given in terms of 1920 ⁇ 1080 image dimensions (e.g., resolution). Some examples of the techniques herein may utilize images with other dimensions (e.g., 3840 ⁇ 2160, 1280 ⁇ 720, etc.).
- the apparatus may determine a bottom coordinate of a bottom region.
- a bottom region is a region that is nearest to the bottom (e.g., maximum height in a height dimension or y dimension) of a source image.
- the apparatus may determine a region with a height coordinate that is nearest to the bottom of a source image (e.g., the largest height coordinate or y coordinate).
- the bottom region is the seventh region 182 and the bottom coordinate 186 is the height coordinate of the bottom of the seventh region 182 .
- a macro view is a view (e.g., portion) of an image (e.g., source image) that depicts multiple people.
- a macro view may include a complete width of a source image, a subset of the height of a source image, or a combination thereof.
- the apparatus may generate a macro view 188 of the source image 160 that depicts people (e.g., the first person, the second person, etc.).
- a macro view may be cropped from a source image with a resolution to include every person in the field of view.
- a macro view may be adjusted over time (e.g., over a sequence of source images) by monitoring the movement of people in the area according to person detection (e.g., machine learning detection).
- the apparatus may generate a macro view based on a top coordinate and a bottom coordinate. In FIG. 1 , for instance, the apparatus may generate the macro view 188 based on the top coordinate 184 and the bottom coordinate 186 . In some examples, the apparatus may determine whether a difference between a bottom coordinate and a top coordinate is greater than a threshold. In some examples, the threshold is a fraction (e.g., a quarter, a third, a half, three-fifths, a percentage, etc.) of a vertical size of a source image. For instance, the apparatus may determine whether a difference 164 between the bottom coordinate 186 and the top coordinate 184 is greater than half the vertical size (e.g., half the total height) of the source image 160 . In the example of FIG. 1 , the difference 164 is greater than half the vertical size of the source image 160 .
- the threshold is a fraction (e.g., a quarter, a third, a half, three-fifths, a percentage, etc.)
- the apparatus may generate a macro view from a source image between a bottom coordinate and a top coordinate. For instance, if a difference is less than or equal to half the height of a source image, the portion of the source image between the bottom coordinate and the top coordinate may be utilized as the macro view (e.g., the portion may be cropped out of the source image and used to generate a part of a compound image, such as a top half or a bottom half).
- the apparatus may determine a target zone.
- a target zone is a portion of a source image that includes a greatest quantity of regions.
- an apparatus may partition a source image into zones.
- the source image 160 is partitioned into four zones: a first zone 190 , a second zone 192 , a third zone 194 , and a fourth zone 196 .
- a different quantity of zones e.g., two zones, three zones, five zones, etc.
- a zone may span the entire width of a source image, a subset of a height of a source image, or a combination thereof.
- zones may be sized equally.
- the first zone 190 , the second zone 192 , the third zone 194 , and the fourth zone 196 have a same size (e.g., area, quantity of pixels, etc.).
- an apparatus may determine zone region quantities.
- a zone region quantity is a quantity of regions associated with a zone.
- a region may be associated with a zone if the region is within (e.g., partially or completely within) a zone and is not associated with another zone.
- an apparatus may determine zone region quantities according to a sequence of zones (e.g., top zone to bottom zone).
- the apparatus may determine zone associations (e.g., count) in a sequence of the first zone 190 , the second zone 192 , the third zone 194 , and the fourth zone 196 .
- a first zone region quantity of the first zone 190 is three (including the first region 170 , the second region 172 , and the third region 174 ), a second zone region quantity of the second zone 192 is four (including the fourth region 176 , the fifth region 178 , the sixth region 180 , and the seventh region 182 , while excluding the third region 174 due to being already associated with the first zone 190 ), a third zone region quantity of the third zone 194 is zero (excluding the sixth region 180 and the seventh region 182 due to being already associated with the second zone 192 ), and the fourth zone region quantity of the fourth zone 196 is zero.
- the apparatus determines that the second zone 192 is the target zone, because the second zone 192 has the greatest quantity of regions. In some examples, if multiple zones have a same quantity of regions, the apparatus may select a zone that includes or is nearest to a midpoint between the top coordinate 184 and a bottom coordinate 186 .
- an apparatus may determine a top region point associated with a zone (e.g., the target zone) in a source image.
- a top region point is a top value or coordinate of a region associated with a zone.
- the apparatus may determine a highest height coordinate (e.g., the minimum height coordinate or y coordinate) of a highest region associated with a zone.
- the top region point 198 for the second zone 192 is the height coordinate of the top of the fourth region 176 .
- the apparatus may generate a macro view based on a top region point. For instance, the apparatus may determine whether a top region point is spatially below a level coordinate (e.g., half height or another quantity) of the source image. In some examples, the apparatus may determine whether a top region point is greater than the level coordinate (e.g., 1080/2). In the example of FIG. 1 , the top region point 198 is less than (e.g., is not greater than) the level coordinate (e.g., the top region point 198 is spatially above the level coordinate, which is a half height in FIG. 1 ).
- a level coordinate e.g., half height or another quantity
- an apparatus may determine a top boundary of a macro view based on a top region point. For instance, an apparatus may determine a top boundary of a macro view using an offset (e.g., a fraction of source image height, source image height/8, 1080/8, etc.) from a top region point.
- the top boundary of the macro view 188 may be positioned at the top region point 198 minus an eighth of the source image 160 height.
- the macro view 188 may be taken from the source image 160 and may be used to generate a compound image 262 (e.g., positioned at the top half of the compound image 262 ).
- the macro view 188 may be sized to be half the height of the source image 160 cropped from the top boundary.
- a macro view (e.g., the macro view 188 ) may be cropped from a source image (e.g., source image 160 ), resized, scaled, shifted, or a combination thereof for inclusion in a compound image (e.g., compound image 262 ).
- an apparatus may determine a bottom region point associated with a zone (e.g., the target zone) in a source image.
- a bottom region point is a bottom value or coordinate of a region associated with a zone.
- the apparatus may determine a lowest height coordinate (e.g., the maximum height coordinate or y coordinate) of a lowest region associated with a zone.
- the bottom region point for the second zone 192 e.g., target zone
- the bottom region point corresponds to the bottom coordinate 186 in the example of FIG. 1 .
- the apparatus may generate a macro view based on a bottom region point. For instance, the apparatus may determine whether a top region point is spatially below a level coordinate (e.g., half height or another quantity) of a source image. In a case that the top region point is spatially below the level coordinate, an apparatus may determine a top boundary of a macro view based on a bottom region point. For instance, an apparatus may determine the top boundary of the macro view using an offset (e.g., a fraction of source image height, source image height/2, 1080/2, etc.) from the bottom region point. For instance, a top boundary of a macro view may be positioned at the bottom region point minus half of a source image height. The macro view may be cropped out of a source image and used to generate a compound image (e.g., positioned at the top half of a compound image).
- a level coordinate e.g., half height or another quantity
- an apparatus may determine a top boundary of a macro view based on a bottom region point
- an apparatus may sort (e.g., order) people, regions, or a combination thereof according to a proximity to an image sensor.
- the people, regions, or combination thereof may be ordered based on region area, pixel quantity, or a combination thereof. For instance, an apparatus may sort regions from smallest region to largest region in terms of region area, pixel quantity, or a combination thereof.
- the apparatus may determine a first quantity of pixels in the first region 170 and a second quantity of pixels in the second region 172 .
- the apparatus may determine that the first person is further away (from the image sensor) than the second person when the first quantity of pixels in the first region 170 is less than the second quantity of pixels in the second region 172 .
- the apparatus may order the regions in FIG. 1 in the following order: first region 170 , second region 172 , third region 174 , fourth region 176 , fifth region 178 , sixth region 180 , and seventh region 182 .
- an apparatus may generate a focus cell(s) based on the sorting.
- a focus cell is an image that emphasizes a person (e.g., depicts a person alone, predominantly shows an individual, is approximately horizontally centered on a person's face, etc.).
- generating a focus cell may include formatting image content in a region (e.g., scaling, cropping, or a combination thereof).
- a focus cell may provide a detailed view (e.g., zoomed-in view) of a person.
- an apparatus may generate focus cells corresponding to a set of people or regions that are furthest from the image sensor.
- the apparatus may generate a first focus cell 281 that depicts the first person alone.
- a person or set of people that is furthest from the image sensor may be prioritized for presentation in a focus cell(s).
- the four furthest people away from the image sensor may be selected (e.g., initially selected) for presentation in a focus cell.
- the apparatus may generate the first focus cell 281 for the first person (e.g., first region 170 ).
- the apparatus may generate a second focus cell 283 that depicts the second person alone (e.g., for the second person, second region 172 , or a combination thereof), where the compound image 262 includes the first focus cell 281 and the second focus cell 283 .
- the apparatus may generate a third focus cell 285 for the third person (e.g., third region 174 ) and a fourth focus cell 287 (e.g., fourth region 176 ).
- a quantity of focus cells may be fewer than a quantity of people detected, a quantity of people in a source image, or a quantity of people in a macro view.
- seven people appear in the source image 160 and in the macro view 188 , while four focus cells are utilized in the compound image 262 .
- a compound image is an image that includes a macro view and a focus cell(s). For instance, a compound image may concurrently show all people in a source image with a more detailed view(s) of an individual(s), may increase an immersive video conference experience, or a combination thereof.
- a compound image may be generated from a source image(s) from a single image sensor, from a single image stream, or a combination thereof.
- a compound image may provide a tidy layout of a macro view combined with focus cells. Different quantities of focus cells (e.g., 1, 2, 3, 4, 5, 6, etc.) may be utilized in a compound image in some examples.
- an apparatus may order focus cells along a horizontal dimension. For instance, an apparatus may order (e.g., initially order) focus cells according to an order of regions along a horizontal dimension in a source image.
- the focus cells are ordered from left to right as the second focus cell 283 , the first focus cell 281 , the third focus cell 285 , and the fourth focus cell 287 according to the left-to-right ordering of the second region 172 , the first region 170 , the third region 174 , and the fourth region 176 .
- an apparatus may detect a speaker (e.g., a person that is speaking). For instance, an apparatus may utilize machine learning, a computer vision technique, sound direction detection using signals from a microphone array, voice recognition, or a combination thereof to determine which person is speaking.
- a focus cell may be utilized to indicate a person that is speaking (even with a mask, for instance) based on the speaker detection.
- an apparatus may produce a speaker indicator to indicate which person is speaking.
- a speaker indicator 289 is illustrated as a box outline corresponding to the first focus cell 281 .
- Examples of a speaker indicator may include a box, framing around a focus cell, a color overlay, a color outline around a speaker, a symbol, highlighted text, animated lines, or a combination thereof. Utilizing focus cells (e.g., a speaker indicator) may indicate changes in speakers.
- a focus cell(s) in a compound image may be changed based on a detected speaker(s). For example, if a person begins to speak that is not shown in a focus cell in a compound image, an apparatus may generate a focus cell corresponding to the new speaker and add the focus cell to the compound image. In some examples, a focus cell of a new speaker may be added as a rightmost focus cell in a compound image and a leftmost focus cell may be removed from the compound image.
- a leftmost focus cell may be removed from a compound image, another focus cell(s) may be shifted to the left, and a focus cell corresponding to the new speaker may be added at the right.
- a rightmost focus cell may be removed from a compound image, another focus cell(s) may be shifted to the right, and a focus cell corresponding to the new speaker may be added at the left.
- Adding focus cells in a set pattern may indicate an order in which new speakers occur, which may help to indicate which person is talking to a user viewing a compound image.
- a spatial order of people or regions in the source image may be maintained in the focus cells when a new speaker occurs. For instance, a focus cell of a speaker that spoke longest ago (out of the current set of focus cells, for example) may be removed and a focus cell of a new speaker may be added, where the horizontal spatial order of the people in a source image may be maintained when adding the focus cell of the new speaker (e.g., a focus cell(s) may be shifted, separated, moved together, or a combination thereof to maintain the spatial order).
- an apparatus may generate a region indicator(s) in a compound image, where the region indicator(s) correspond to a focus cell(s) currently included in the compound image.
- a first region indicator 271 corresponding to the first focus cell 281 , a second region indicator 273 corresponding to the second focus cell 283 , a third region indicator 275 corresponding to the third focus cell 285 , and a fourth region indicator 277 corresponding to the fourth focus cell 287 are provided in the macro view 188 of the compound image 262 .
- an apparatus may cause display of (e.g., a processor may instruct display of) a compound image including a macro view and a focus cell. For instance, an apparatus may display a compound image on a display panel or may provide the compound image to a display device for display. In the example of FIG. 2 , the compound image 262 may be displayed on a display panel or a display device.
- FIG. 3 is a block diagram illustrating an example of an electronic device 302 that may be used to generate a compound image.
- An electronic device may be a device that includes electronic circuitry. Examples of the electronic device 302 may include a computer (e.g., laptop computer), a smartphone, a tablet computer, mobile device, camera, etc. In some examples, the electronic device 302 may include or may be coupled to a processor 304 , memory 306 , image sensor 310 , or a combination thereof. In some examples, components of the electronic device 302 may be coupled via an interface(s) (e.g., bus(es), wire(s), connector(s), etc.).
- an interface(s) e.g., bus(es), wire(s), connector(s), etc.
- the electronic device 302 may include additional components (not shown) or some of the components described herein may be removed or modified without departing from the scope of this disclosure.
- the electronic device 302 may include the image sensor 310 (e.g., an integrated camera).
- the electronic device 302 may be in communication with a separate image sensor (e.g., camera).
- an image sensor e.g., web cam, camera, infrared (IR) sensor, depth sensor, radar, etc.
- an image may be attached to the electronic device and may send an image(s) (e.g., video stream) to the electronic device 302 .
- an image may include visual information, depth information, IR sensing information, or a combination thereof.
- the electronic device 302 may include a communication interface(s) (not shown in FIG. 3 ).
- the electronic device 302 may utilize the communication interface(s) to communicate with an external device(s) (e.g., networked device, server, smartphone, microphone, camera, printer, computer, keyboard, mouse, etc.).
- the electronic device 302 may be in communication with (e.g., coupled to, have a communication link with) a display device(s).
- the electronic device 302 may include an integrated display panel, touchscreen, button, microphone, or a combination thereof.
- the communication interface may include hardware, machine-readable instructions, or a combination thereof to enable a component (e.g., processor 304 , memory 306 , etc.) of the electronic device 302 to communicate with the external device(s).
- the communication interface may enable a wired connection, wireless connection, or a combination thereof to the external device(s).
- the communication interface may include a network interface card, may include hardware, may include machine-readable instructions, or may include a combination thereof to enable the electronic device 302 to communicate with an input device(s), an output device(s), or a combination thereof. Examples of output devices include a display device(s), speaker(s), headphone(s), etc. Examples of input devices include a keyboard, a mouse, a touchscreen, image sensor, microphone, etc.
- a user may input instructions or data into the electronic device 302 using an input device(s).
- the communication interface(s) may include a mobile industry processor interface (MIPI), Universal Serial Bus (USB) interface, or a combination thereof.
- MIPI mobile industry processor interface
- USB Universal Serial Bus
- the image sensor 310 or a separate image sensor e.g., webcam
- the communication interface(s) may be coupled to the processor 304 , to the memory 306 , or a combination thereof.
- the communication interface(s) may provide the source image 308 to the processor 304 or the memory 306 from the separate image sensor.
- the image sensor 310 may be a device to sense or capture image information (e.g., an image stream, video stream, etc.).
- the image sensor 310 may include an optical (e.g., visible spectrum) image sensor, red-green-blue (RGB) sensor, IR sensor, depth sensor, etc., or a combination thereof.
- the image sensor 310 may be a device to capture optical (e.g., visual) image data (e.g., a sequence of video frames).
- the image sensor 310 may capture an image (e.g., series of images, video stream, etc.) of a scene.
- the image sensor 310 may capture video for a video conference, broadcast, recording, etc.
- the source image 308 may be a frame of a video stream.
- the image sensor 310 may capture the source image 308 with a wide-angle field of view (e.g., 120° field of view).
- the memory 306 may be an electronic storage device, magnetic storage device, optical storage device, other physical storage device, or a combination thereof that contains or stores electronic information (e.g., instructions, data, or a combination thereof).
- the memory 306 may be, for example, Random Access Memory (RAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), a storage device, an optical disc, the like, or a combination thereof.
- RAM Random Access Memory
- EEPROM Electrically Erasable Programmable Read-Only Memory
- the memory 306 may be volatile or non-volatile memory, such as Dynamic Random Access Memory (DRAM), EEPROM, magnetoresistive random-access memory (MRAM), phase change RAM (PCRAM), memristor, flash memory, the like, or a combination thereof.
- DRAM Dynamic Random Access Memory
- MRAM magnetoresistive random-access memory
- PCRAM phase change RAM
- memristor flash memory, the like, or a combination thereof.
- the memory 306 may be a non-transitory tangible machine-readable or computer-readable storage medium, where the term “non-transitory” does not encompass transitory propagating signals.
- the memory 306 may include multiple devices (e.g., a RAM card and a solid-state drive (SSD)).
- the memory 306 may be integrated into the processor 304 .
- the memory 306 may include (e.g., store) a source image 308 , region determination instructions 312 , compound image instructions 313 , or a combination thereof.
- the processor 304 is logic circuitry. Some examples of the processor 304 may include a general-purpose processor, central processing unit (CPU), a graphics processing unit (GPU), a semiconductor-based microprocessor, field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), other hardware device, or a combination thereof suitable for retrieval and execution of instructions stored in the memory 306 . In some examples, the processor 304 may be an application processor. In some examples, the processor 304 may perform one, some, or all of the aspects, operations, elements, etc., described in one, some, or all of FIG. 1 - 6 . For instance, the processor 304 may process an image(s) (e.g., perform an operation on the source image 308 ).
- an image(s) e.g., perform an operation on the source image 308 .
- the processor 304 may be logic circuitry to perform object detection, object tracking, feature point detection, region determination, region sorting, focus cell generation, macro view generation, etc., or a combination thereof.
- the processor 304 may execute instructions stored in the memory 306 .
- the processor 304 may include electronic circuitry that includes electronic components for performing an operation or operations described herein without the memory 306 .
- the processor 304 may perform one, some, or all of the aspects, operations, elements, etc., described in one, some, or all of FIG. 1 - 6 .
- the processor 304 may receive a source image 308 (e.g., image sensor stream, video stream, etc.). For instance, the processor 304 may receive the source image 308 from the image sensor 310 . In some examples, the processor 304 may receive the source image 308 (e.g., image sensor stream, video stream, etc.) from a separate image sensor. For instance, the processor 304 may receive an image stream via a wired or wireless communication interface (e.g., MIPI, USB port, Ethernet port, Bluetooth receiver, etc.).
- a wired or wireless communication interface e.g., MIPI, USB port, Ethernet port, Bluetooth receiver, etc.
- the processor 304 may execute the region determination instructions 312 to determine, in the source image 308 , a first region that depicts a first person and a second region that depicts a second person. For example, the processor 304 may execute the region determination instructions 312 to determine regions as described in FIG. 1 - 2 .
- a machine learning model may be trained to detect a region.
- a region may indicate a detected object (e.g., face, torso, body, etc.).
- a region may be a rectangular region that spans the dimensions of a detected object.
- a machine learning model may be trained using training images that depict an object (e.g., face, torso, body, etc.) for detection.
- a training image may be labeled with a region located around the object.
- the region determination instructions 312 may include a machine learning model trained to detect the first person, the second person, etc.
- the processor 304 may execute the region determination instructions 312 to determine the first region based on the detected first person (e.g., the first person's face) and to determine the second region based on the detected second person (e.g., the second person's face).
- the processor 304 may execute the compound image instructions 313 to generate a compound image.
- the processor 304 may sort the regions to determine a person or set of people that are furthest from the image sensor 310 .
- the person or set of people that is furthest from the image sensor 310 may be prioritized (e.g., initially prioritized) for focus cell generation, display, or a combination thereof.
- the processor 304 may execute the compound image instructions 313 to determine that the first person is further away than the second person relative to the image sensor 310 based on the first region and the second region.
- the processor 304 may determine that the first region includes fewer pixels than the second region.
- the processor 304 may execute the compound image instructions 313 to generate a focus cell that depicts the first person alone.
- the processor 304 may produce multiple focus cells.
- a focus cell may depict an individually tracked person that is framed in the focus cell.
- focus cells may depict the furthest people by face detection area determination and sorting (e.g., prioritization).
- the processor 304 may execute the compound image instructions 313 to generate a macro view.
- the processor 304 may generate a macro view as described in FIG. 1 - 2 .
- the macro view may depict all people in the field of view of the source image 308 .
- the macro view may depict all people and an environment in the field of view of the image sensor 310 .
- the macro view may provide a view of interactions between people (e.g., attendees to a video conference), may reduce a view of distracting content (e.g., crop out areas of the source image 308 away from the people), or a combination thereof.
- the processor 304 may partition the source image 308 into zones. For instance, the processor 304 may partition the source image 308 into a first zone and a second zone. The processor 304 may determine a first zone region quantity of the first zone and a second zone region quantity of the second zone. The processor 304 may determine a target zone based on the first zone region quantity and the second zone region quantity. For instance, a zone with a greatest quantity of regions may be selected as the target zone. The target zone may be utilized to produce the macro view. For instance, depending on location of a top region point, the macro view may be produced relative to a top region point or a bottom region point as described in FIG. 1 - 2 . In some examples, the processor 304 may generate a macro view of the source image 308 that depicts the first person and the second person.
- the processor 304 may execute the compound image instructions 313 to generate a compound image. For instance, the processor 304 may combine a macro view and a focus cell(s) to produce the compound image.
- the electronic device 302 e.g., communication interface
- the processor 304 may execute the compound image instructions 313 to instruct display of a compound image and the focus cell(s). For instance, the electronic device 302 may provide the compound image to a display panel or display device for display. In some examples, the electronic device 302 may display a compound image including the macro view and the first focus cell.
- FIG. 4 is a block diagram illustrating an example of an apparatus 430 for compound image generation.
- the apparatus 430 may perform an aspect or aspects of the operations described in FIG. 1 , FIG. 2 , FIG. 3 , or a combination thereof.
- the apparatus 430 may be an example of the electronic device 302 described in FIG. 3 , or the electronic device 302 described in FIG. 3 may be an example of the apparatus 430 .
- the apparatus 430 may include a processor 418 and a communication interface 429 .
- Examples of the apparatus 430 may include a computing device, smartphone, laptop computer, tablet device, mobile device, etc.).
- one, some, or all of the components of the apparatus 430 may be structured in hardware or circuitry.
- the apparatus 430 may perform one, some, or all of the operations described in FIG. 1 - 6 .
- An image sensor 414 may capture a source image 416 .
- the source image 416 may be a frame of a video stream.
- the source image 416 may depict a scene.
- the source image 416 may depict people in the scene.
- the source image 416 may depict a first person and a second person.
- the image sensor 414 may be an example of the image sensor 310 described in FIG. 3 .
- the source image 416 may be provided to the communication interface 429 .
- the communication interface 429 may receive the source image 416 from the image sensor 414 .
- the source image 416 may be provided to the processor 418 from the communication interface 429 .
- the processor 418 may be an example of the processor 304 described in FIG. 3 .
- the processor 304 described in FIG. 3 may be an example of the processor 418 .
- the processor 418 may determine, in the source image 416 , a first region that includes the first person and a second region that includes the second person. For instance, the processor 418 may determine a region(s) as described in one, some, or all of FIG. 1 - 3 .
- the processor 418 may determine that the first person is further away from the image sensor 414 than the second person based on a first area of the first region (e.g., first region width x first region height) and a second area of the second region. For instance, the processor 418 may sort the regions according to region area, pixel quantity, or a combination thereof, as described in one, some, or all of FIG. 1 - 3 to determine that the first person is further away than the second person.
- a first area of the first region e.g., first region width x first region height
- the processor 418 may sort the regions according to region area, pixel quantity, or a combination thereof, as described in one, some, or all of FIG. 1 - 3 to determine that the first person is further away than the second person.
- the processor 418 may determine a horizontal order of the first region and the second region based on a first horizontal position of the first region and a second horizontal position of the second region.
- the first horizontal position of the first region may be a horizontal coordinate (e.g., x coordinate, leftmost coordinate, center coordinate, etc.) of the first region.
- the second horizontal position of the second region may be a horizontal coordinate (e.g., x coordinate, leftmost coordinate, center coordinate, etc.) of the second region.
- the processor 418 may determine a horizontal order of regions (e.g., minimum to maximum horizontal coordinates, or maximum to minimum horizontal coordinates).
- the processor 418 may generate a first focus cell based on the first region and a second focus cell based on the second region. For instance, the processor 418 may generate focus cells as described in one, some, or all of FIG. 1 - 3 . In some examples, the processor 418 may perform an operation(s) on pixel data in the first region to produce the first focus cell, such as scaling, shifting, interpolation, transformation, or a combination thereof. In some examples, the processor 418 may perform an operation(s) on pixel data in the second region to produce the second focus cell, such as scaling, shifting, interpolation, transformation, or a combination thereof.
- the processor 418 may generate a macro view of the source image 416 that depicts the first person and the second person. For instance, the processor 418 may generate the macro view as described in one, some, or all of FIG. 1 - 3 . In some examples, the processor 418 may determine (e.g., select and crop out) a portion of the source image 416 . For instance, the processor 418 may determine a top coordinate (e.g., minimum y coordinate) of a top region in the source image 416 and a bottom coordinate (e.g., maximum y coordinate) of a bottom region in the source image 416 , and may determine whether a difference between the bottom coordinate and the top coordinate is greater than a threshold.
- a threshold e.g., minimum y coordinate
- the processor 418 may generate the macro view from the source image 416 between the bottom coordinate and the top coordinate in response to determining that the difference is not greater than the threshold. In a case that the difference is greater than the threshold, the processor 418 may determine the macro view by determining a target zone and selecting a portion of the source image 416 based on a top region point associated with the target zone, a bottom region point associated with the target region zone, or a combination thereof. In some examples, the processor 418 may perform an operation(s) on the portion to produce the macro view, such as scaling, shifting, interpolation, transformation, or a combination thereof.
- the processor 418 may generate a compound image including the first focus cell and the second focus cell in the horizontal order and the macro view. For instance, the processor 418 may combine the macro view and the focus cells to produce the compound image. For instance, the processor 418 may generate the compound image including the macro view in the top half of the compound image and the focus cells in the bottom half of the compound image. In some examples, other arrangements may be utilized (e.g., macro view in the bottom half and focus cells in the top half, macro view in a top third and focus cells in a bottom two-thirds, etc.).
- the apparatus 430 may display the compound image, may send the compound image to another device(s) for display, store the compound image, or a combination thereof.
- the apparatus 430 e.g., communication interface 429
- the remote device may display the compound image on an integrated display panel or provide the compound image to a display device coupled to the remote device for display.
- FIG. 5 is a flow diagram illustrating an example of a method 500 for macro view generation.
- the method 500 or a method 500 element(s) may be performed by an electronic device or apparatus (e.g., electronic device 302 , apparatus 430 , laptop computer, smartphone, tablet device, etc.).
- the method 500 may be performed by the electronic device 302 described in FIG. 3 or the apparatus 430 described in FIG. 4 .
- an apparatus may obtain a source image.
- the apparatus may obtain a source image as described in one, some, or all of FIG. 1 - 4 .
- the apparatus may capture the source image using an integrated image sensor or may receive the source image from a linked image sensor (e.g., connected web cam).
- the apparatus determines regions.
- the apparatus may determine regions as described in one, some, or all of FIG. 1 - 4 .
- the apparatus may perform face or person detection and produce a region for each detected face or person.
- the apparatus may determine a top coordinate and a bottom coordinate. In some examples, the apparatus may determine the top coordinate and the bottom coordinate of the regions as described in one, some, or all of FIG. 1 - 4 .
- the apparatus may partition the source image into zones.
- the apparatus may partition the source image into zones (e.g., four horizontal zones) as described in one, some, or all of FIG. 1 - 4 .
- the apparatus may determine zone region quantities. In some examples, the apparatus may determine zone region quantities as described in one, some, or all of FIG. 1 - 4 .
- the apparatus may determine a top region point and a bottom region point.
- the apparatus may determine the top region point and the bottom region point (for a zone(s), for instance) as described in one, some, or all of FIG. 1 - 4 .
- the apparatus may determine a top region point corresponding to a top coordinate (e.g., minimum y coordinate) of a top region associated with the target region.
- the apparatus may determine a bottom region point corresponding to a bottom coordinate (e.g., maximum y coordinate) of a bottom region associated with the target region.
- the apparatus may determine whether a difference between the bottom coordinate and the top coordinate is greater than a first threshold (e.g., half height of the source image, 1080/2, etc.). In some examples, the apparatus may determine zone region quantities as described in one, some, or all of FIG. 1 - 4 .
- a first threshold e.g., half height of the source image, 1080/2, etc.
- the apparatus may generate a macro view from the source image between the top coordinate and the bottom coordinate at step 516 .
- the macro view may be extracted from (e.g., copied from, cropped from, etc.) the source image between the top coordinate and the bottom coordinate.
- the apparatus may determine a target zone based on the zone region quantities at step 518 .
- the apparatus may determine the target zone as described in one, some, or all of FIG. 1 - 4 . For instance, the apparatus may select a zone that has a greatest quantity of associated regions as the target zone.
- the apparatus may determine whether a top region point is greater than a second threshold (e.g., half height of the source image, 1080/2, etc.). In some examples, the apparatus may determine whether the top region point is greater than the second threshold as described in one, some, or all of FIG. 1 - 4 . In some examples, the first threshold and the second threshold may be the same quantity or different quantities.
- a second threshold e.g., half height of the source image, 1080/2, etc.
- the apparatus may determine a macro view top boundary based on a bottom region point at step 522 .
- the apparatus may determine the macro view top boundary as described in one, some, or all of FIG. 1 - 4 .
- the apparatus may determine the macro view top boundary as the bottom region point minus a value (e.g., half height of the source image, 1080/2, second threshold, another value, etc.).
- the apparatus may generate a macro view from the source image based on the top boundary.
- the apparatus may generate the macro view based on the top boundary as described in one, some, or all of FIG. 1 - 4 .
- the apparatus may extract the macro view from (e.g., copy the macro view from, crop the macro view from) the source image from the top boundary with a macro view size (e.g., 1920 ⁇ 540, source image width and half height, etc.).
- a macro view size e.g. 1920 ⁇ 540, source image width and half height, etc.
- the apparatus may extract the macro view from between the top boundary and the bottom region point.
- the apparatus may determine a macro view top boundary based on a top region point at step 524 .
- the apparatus may determine the macro view top boundary as described in one, some, or all of FIG. 1 - 4 . For instance, the apparatus may determine the macro view top boundary as the top region point minus a value (e.g., eighth height of the source image, 1080/8, margin size, another value, etc.).
- the apparatus may generate a macro view from the source image based on the top boundary.
- the method 500 may include generating a compound view based on the macro view (and a focus cell(s), for instance), displaying the compound view, transmitting the compound view, saving the compound view, or a combination thereof.
- the apparatus may generate the compound view as described in one, some, or all of FIG. 1 - 4 .
- FIG. 6 is a block diagram illustrating an example of a computer-readable medium 650 for compound image generation.
- the computer-readable medium 650 is a non-transitory, tangible computer-readable medium.
- the computer-readable medium 650 may be, for example, RAM, DRAM, EEPROM, MRAM, PCRAM, a storage device, an optical disc, the like, or a combination thereof.
- the computer-readable medium 650 may be volatile memory, non-volatile memory, or a combination thereof.
- the memory 306 described in FIG. 3 may be an example of the computer-readable medium 650 described in FIG. 6 .
- the computer-readable medium 650 may include data (e.g., information, executable instructions, or a combination thereof). In some examples, the computer-readable medium 650 may include sorting instructions 652 , focus cell generation instructions 654 , macro view generation instructions 656 , compound image instructions 658 , or a combination thereof.
- the sorting instructions 652 may include instructions when executed cause a processor of an electronic device to sort a plurality of regions of a source image to determine that a first person in a first region and a second person in a second region are further from an image sensor. In some examples, sorting a plurality of regions may be performed as described in one, some, or all of FIG. 1 - 5 .
- the focus cell generation instructions 654 may include instructions when executed cause the processor to generate a first focus cell including the first person alone based on the first region and a second focus cell including the second person alone based on the second region. In some examples, generating focus cells may be performed as described in one, some, or all of FIG. 1 - 5 .
- the macro view generation instructions 656 may include instructions when executed cause the processor to generate a macro view of the source image that depicts the first person and the second person. In some examples, generating the macro view may be performed as described in one, some, or all of FIG. 1 - 5 .
- the compound image instructions 658 may include instructions when executed cause the processor to generate a compound image including the macro view, the first focus cell, and the second focus cell. In some examples, generating the compound image may be performed as described in one, some, or all of FIG. 1 - 5 . In some examples, the macro view occupies half (e.g., top half or bottom half) of the compound image.
- the compound image instructions 658 may include instructions when executed cause the processor to remove the first focus cell and add a third focus cell in response to a detection of a third person speaking.
- the first focus cell may be removed from the compound image or may be omitted in a subsequent compound image based on a subsequent source image.
- removing a focus cell(s), adding a focus cell(s), or a combination thereof may be performed as described in one, some, or all of FIG. 1 - 5 .
- the first focus cell may be removed from a left side
- the second focus cell may be shifted leftward
- the third focus cell may be added to a right side.
- items described with the term “or a combination thereof” may mean an item or items.
- the phrase “A, B, C, or a combination thereof” may mean any of: A (without B and C), B (without A and C), C (without A and B), A and B (without C), B and C (without A), A and C (without B), or all of A, B, and C.
Abstract
In some examples, an electronic device includes an image sensor to capture a source image. In some examples, the electronic device includes a processor to determine, in the source image, a first region that depicts a first person and a second region that depicts a second person. In some examples, the processor is to, in response to determining that the first person is further away than the second person relative to the image sensor based on the first region and the second region, generate a first focus cell that depicts the first person alone. In some examples, the processor is to generate a macro view of the source image that depicts the first person and the second person. In some examples, the processor is to instruct display of a compound image including the macro view and the first focus cell.
Description
- Electronic technology has advanced to become virtually ubiquitous in society and has been used for many activities in society. For example, electronic devices are used to perform a variety of tasks, including work activities, communication, research, and entertainment. For instance, computers may be used to participate in virtual meetings over the Internet.
-
FIG. 1 is a diagram illustrating an example of a source image that may be utilized to generate a compound image in accordance with some examples of the techniques described herein; -
FIG. 2 is a diagram illustrating an example of a compound image in accordance with some examples of the techniques described herein; -
FIG. 3 is a block diagram illustrating an example of an electronic device that may be used to generate a compound image; -
FIG. 4 is a block diagram illustrating an example of an apparatus for compound image generation; -
FIG. 5 is a flow diagram illustrating an example of a method for macro view generation; and -
FIG. 6 is a block diagram illustrating an example of a computer-readable medium for compound image generation. - People in a video meeting may be displayed with different sizes due to the people being situated at different distances from a camera (in a meeting room, for instance). It may be difficult for a participant that is unfamiliar with other attendees to identify who is speaking.
- Some examples of the techniques described herein provide a combination of a wide angle view with an individual view of a person. For instance, showing an individual view may allow a remote participant to more clearly see a person that is sitting relatively farther from the camera (e.g., a person that may appear small in the wide angle view). In some examples, multiple individual views are generated. The individual views may be ordered (e.g., initially ordered) in accordance with an arrangement of people in the video meeting. In some examples, the order may be adjusted over time, which may indicate a speaking sequence.
- Throughout the drawings, similar reference numbers may designate similar or identical elements. When an element is referred to without a reference number, this may refer to the element generally, with or without limitation to any particular drawing or figure. In some examples, the drawings are not to scale and the size of some parts may be exaggerated to more clearly illustrate the example shown. Moreover, the drawings provide examples in accordance with the description. However, the description is not limited to the examples provided in the drawings.
-
FIG. 1 is a diagram illustrating an example of asource image 160 that may be utilized to generate a compound image in accordance with some examples of the techniques described herein.FIG. 2 is a diagram illustrating an example of acompound image 262 in accordance with some examples of the techniques described herein.FIG. 1 andFIG. 2 are described together. - A source image is a digital image captured by an image sensor (e.g., digital camera, web cam, etc.). In some examples, a source image depicts an environment (e.g., conference room, meeting room, office, family room, etc.) in which people (e.g., meeting participants, attendees, etc.) may be situated. In some examples, a source image may have a field of view that is greater than sixty degrees along a horizontal dimension. For instance, a source image may be captured using an image sensor with a wide-angle lens, a fisheye lens, etc. A source image may be captured by an image sensor (e.g., digital camera, web cam, etc.) and provided to an apparatus (e.g., electronic device, computing device, server, etc.). For instance, an apparatus may include an image sensor, may be coupled to an image sensor, may be in communication with an image sensor, or a combination thereof. In the example of
FIG. 1 , thesource image 160 depicts seven people situated around a conference table. - The apparatus may determine a region that depicts a person. For instance, the apparatus may include a processor to execute a machine learning model to detect a person (e.g., face, head and shoulders, etc.). Machine learning is a technique where a machine learning model (e.g., artificial neural network (ANN), convolutional neural network (CNN), etc.) is trained to perform a task based on a set of examples (e.g., data). Training a machine learning model may include determining weights corresponding to structures of the machine learning model. In some examples, artificial neural networks may be a kind of machine learning model that may be structured with nodes, layers, connections, or a combination thereof.
- In some examples, a machine learning model may be trained with a set of training images. For instance, a set of training images may include images of an object(s) for detection (e.g., images of a user, people, etc.). In some examples, the set of training images may be labeled with the class of object(s), location (e.g., region, bounding box, etc.) of object(s) in the images, or a combination thereof. The machine learning model may be trained to detect the object(s) by iteratively adjusting weights of the model(s) and evaluating a loss function(s). The trained machine learning model may be executed to detect the object(s) (with a degree of probability, for instance). For example, the
source image 160 may be utilized with computer vision techniques to detect an object(s) (e.g., a user, people, etc.). - In some examples, an apparatus uses machine learning, a computer vision technique(s), or a combination thereof to detect a person or people. For instance, an apparatus may detect a location of a person (e.g., face) in a source image and provide a region that includes (e.g., depicts) the person. For instance, the apparatus may produce a region (e.g., bounding box) around a detected face. In some examples, the region may be sized based on the size of the detected person in the image (e.g., sized proportionate to the size of the detected face, sized with a margin from a face edge or detected feature, such as eyes, etc.). In the example of
FIG. 1 , afirst region 170, asecond region 172, athird region 174, a fourth region 176, afifth region 178, asixth region 180, and aseventh region 182 are provided. For instance, thefirst region 170 depicts a first person, thesecond region 172 depicts a second person, thethird region 174 depicts a third person, the fourth region 176 depicts a fourth person, and so on. - In some examples, the apparatus may determine a top coordinate of a top region. A top region is a region that is nearest to the top (e.g., 0 in a height dimension or y dimension) of a source image. For instance, the apparatus may determine a region with a height coordinate that is nearest to the top of a source image (e.g., the smallest height coordinate or y coordinate). In the example of
FIG. 1 , the top region is thefirst region 170 and thetop coordinate 184 is the height coordinate of the top of thefirst region 170. In some examples, coordinates or values of an image may be expressed as increasing from left to right for a horizontal (e.g., width) dimension and increasing from top to bottom for a vertical (e.g., height) dimension. For instance, thesource image 160 may have dimensions of 1920×1080, where an upper left corner of thesource image 160 may have coordinates of (0, 0). Other coordinates, ordering (e.g., values increasing from bottom to top of an image, etc.), indexing, or a combination thereof may be utilized in some examples of the techniques described herein. Some examples of the techniques described herein may be given in terms of 1920×1080 image dimensions (e.g., resolution). Some examples of the techniques herein may utilize images with other dimensions (e.g., 3840×2160, 1280×720, etc.). - In some examples, the apparatus may determine a bottom coordinate of a bottom region. A bottom region is a region that is nearest to the bottom (e.g., maximum height in a height dimension or y dimension) of a source image. For instance, the apparatus may determine a region with a height coordinate that is nearest to the bottom of a source image (e.g., the largest height coordinate or y coordinate). In the example of
FIG. 1 , the bottom region is theseventh region 182 and thebottom coordinate 186 is the height coordinate of the bottom of theseventh region 182. - A macro view is a view (e.g., portion) of an image (e.g., source image) that depicts multiple people. For instance, a macro view may include a complete width of a source image, a subset of the height of a source image, or a combination thereof. For example, the apparatus may generate a
macro view 188 of thesource image 160 that depicts people (e.g., the first person, the second person, etc.). In some examples, a macro view may be cropped from a source image with a resolution to include every person in the field of view. In some examples, a macro view may be adjusted over time (e.g., over a sequence of source images) by monitoring the movement of people in the area according to person detection (e.g., machine learning detection). - In some examples, the apparatus may generate a macro view based on a top coordinate and a bottom coordinate. In
FIG. 1 , for instance, the apparatus may generate themacro view 188 based on the top coordinate 184 and the bottom coordinate 186. In some examples, the apparatus may determine whether a difference between a bottom coordinate and a top coordinate is greater than a threshold. In some examples, the threshold is a fraction (e.g., a quarter, a third, a half, three-fifths, a percentage, etc.) of a vertical size of a source image. For instance, the apparatus may determine whether adifference 164 between the bottom coordinate 186 and the top coordinate 184 is greater than half the vertical size (e.g., half the total height) of thesource image 160. In the example ofFIG. 1 , thedifference 164 is greater than half the vertical size of thesource image 160. - In some examples, in response to determining that a difference is not greater than the threshold, the apparatus may generate a macro view from a source image between a bottom coordinate and a top coordinate. For instance, if a difference is less than or equal to half the height of a source image, the portion of the source image between the bottom coordinate and the top coordinate may be utilized as the macro view (e.g., the portion may be cropped out of the source image and used to generate a part of a compound image, such as a top half or a bottom half).
- In response to determining that a difference is greater than the threshold, the apparatus may determine a target zone. A target zone is a portion of a source image that includes a greatest quantity of regions. For example, an apparatus may partition a source image into zones. In the example of
FIG. 1 , thesource image 160 is partitioned into four zones: afirst zone 190, asecond zone 192, athird zone 194, and afourth zone 196. In some examples, a different quantity of zones (e.g., two zones, three zones, five zones, etc.) may be utilized. In some examples, a zone may span the entire width of a source image, a subset of a height of a source image, or a combination thereof. In some examples, zones may be sized equally. For instance, thefirst zone 190, thesecond zone 192, thethird zone 194, and thefourth zone 196 have a same size (e.g., area, quantity of pixels, etc.). - In some examples, an apparatus may determine zone region quantities. A zone region quantity is a quantity of regions associated with a zone. A region may be associated with a zone if the region is within (e.g., partially or completely within) a zone and is not associated with another zone. For instance, an apparatus may determine zone region quantities according to a sequence of zones (e.g., top zone to bottom zone). In the example of
FIG. 1 , the apparatus may determine zone associations (e.g., count) in a sequence of thefirst zone 190, thesecond zone 192, thethird zone 194, and thefourth zone 196. For instance, a first zone region quantity of thefirst zone 190 is three (including thefirst region 170, thesecond region 172, and the third region 174), a second zone region quantity of thesecond zone 192 is four (including the fourth region 176, thefifth region 178, thesixth region 180, and theseventh region 182, while excluding thethird region 174 due to being already associated with the first zone 190), a third zone region quantity of thethird zone 194 is zero (excluding thesixth region 180 and theseventh region 182 due to being already associated with the second zone 192), and the fourth zone region quantity of thefourth zone 196 is zero. In the example ofFIG. 1 , the apparatus determines that thesecond zone 192 is the target zone, because thesecond zone 192 has the greatest quantity of regions. In some examples, if multiple zones have a same quantity of regions, the apparatus may select a zone that includes or is nearest to a midpoint between the top coordinate 184 and a bottom coordinate 186. - In some examples, an apparatus may determine a top region point associated with a zone (e.g., the target zone) in a source image. A top region point is a top value or coordinate of a region associated with a zone. For instance, the apparatus may determine a highest height coordinate (e.g., the minimum height coordinate or y coordinate) of a highest region associated with a zone. In the example of
FIG. 1 , thetop region point 198 for the second zone 192 (e.g., target zone) is the height coordinate of the top of the fourth region 176. - In some examples, the apparatus may generate a macro view based on a top region point. For instance, the apparatus may determine whether a top region point is spatially below a level coordinate (e.g., half height or another quantity) of the source image. In some examples, the apparatus may determine whether a top region point is greater than the level coordinate (e.g., 1080/2). In the example of
FIG. 1 , thetop region point 198 is less than (e.g., is not greater than) the level coordinate (e.g., thetop region point 198 is spatially above the level coordinate, which is a half height inFIG. 1 ). - In a case that a top region point is not spatially below (e.g., is at or above) a level coordinate, an apparatus may determine a top boundary of a macro view based on a top region point. For instance, an apparatus may determine a top boundary of a macro view using an offset (e.g., a fraction of source image height, source image height/8, 1080/8, etc.) from a top region point. In the example of
FIG. 1 , the top boundary of themacro view 188 may be positioned at thetop region point 198 minus an eighth of thesource image 160 height. Themacro view 188 may be taken from thesource image 160 and may be used to generate a compound image 262 (e.g., positioned at the top half of the compound image 262). In some examples, themacro view 188 may be sized to be half the height of thesource image 160 cropped from the top boundary. In some examples, a macro view (e.g., the macro view 188) may be cropped from a source image (e.g., source image 160), resized, scaled, shifted, or a combination thereof for inclusion in a compound image (e.g., compound image 262). - In some examples, an apparatus may determine a bottom region point associated with a zone (e.g., the target zone) in a source image. A bottom region point is a bottom value or coordinate of a region associated with a zone. For instance, the apparatus may determine a lowest height coordinate (e.g., the maximum height coordinate or y coordinate) of a lowest region associated with a zone. In the example of
FIG. 1 , the bottom region point for the second zone 192 (e.g., target zone) is the height coordinate of the bottom of theseventh region 182. For instance, the bottom region point corresponds to the bottom coordinate 186 in the example ofFIG. 1 . - In some examples, the apparatus may generate a macro view based on a bottom region point. For instance, the apparatus may determine whether a top region point is spatially below a level coordinate (e.g., half height or another quantity) of a source image. In a case that the top region point is spatially below the level coordinate, an apparatus may determine a top boundary of a macro view based on a bottom region point. For instance, an apparatus may determine the top boundary of the macro view using an offset (e.g., a fraction of source image height, source image height/2, 1080/2, etc.) from the bottom region point. For instance, a top boundary of a macro view may be positioned at the bottom region point minus half of a source image height. The macro view may be cropped out of a source image and used to generate a compound image (e.g., positioned at the top half of a compound image).
- In some examples, an apparatus may sort (e.g., order) people, regions, or a combination thereof according to a proximity to an image sensor. In some examples, the people, regions, or combination thereof may be ordered based on region area, pixel quantity, or a combination thereof. For instance, an apparatus may sort regions from smallest region to largest region in terms of region area, pixel quantity, or a combination thereof. In the example of
FIG. 1 , the apparatus may determine a first quantity of pixels in thefirst region 170 and a second quantity of pixels in thesecond region 172. The apparatus may determine that the first person is further away (from the image sensor) than the second person when the first quantity of pixels in thefirst region 170 is less than the second quantity of pixels in thesecond region 172. For instance, the apparatus may order the regions inFIG. 1 in the following order:first region 170,second region 172,third region 174, fourth region 176,fifth region 178,sixth region 180, andseventh region 182. - In some examples, an apparatus may generate a focus cell(s) based on the sorting. A focus cell is an image that emphasizes a person (e.g., depicts a person alone, predominantly shows an individual, is approximately horizontally centered on a person's face, etc.). In some examples, generating a focus cell may include formatting image content in a region (e.g., scaling, cropping, or a combination thereof). A focus cell may provide a detailed view (e.g., zoomed-in view) of a person. In some examples, an apparatus may generate focus cells corresponding to a set of people or regions that are furthest from the image sensor. For instance, in response to determining that the first person is further away than the second person relative to the image sensor based on the
first region 170 and thesecond region 172, the apparatus may generate a first focus cell 281 that depicts the first person alone. In some examples, a person or set of people that is furthest from the image sensor may be prioritized for presentation in a focus cell(s). In the examples ofFIG. 1 andFIG. 2 , the four furthest people away from the image sensor may be selected (e.g., initially selected) for presentation in a focus cell. For instance, the apparatus may generate the first focus cell 281 for the first person (e.g., first region 170). The apparatus may generate asecond focus cell 283 that depicts the second person alone (e.g., for the second person,second region 172, or a combination thereof), where thecompound image 262 includes the first focus cell 281 and thesecond focus cell 283. In some examples, the apparatus may generate a third focus cell 285 for the third person (e.g., third region 174) and a fourth focus cell 287 (e.g., fourth region 176). - In some examples, a quantity of focus cells may be fewer than a quantity of people detected, a quantity of people in a source image, or a quantity of people in a macro view. In the examples of
FIG. 1 andFIG. 2 , seven people appear in thesource image 160 and in themacro view 188, while four focus cells are utilized in thecompound image 262. A compound image is an image that includes a macro view and a focus cell(s). For instance, a compound image may concurrently show all people in a source image with a more detailed view(s) of an individual(s), may increase an immersive video conference experience, or a combination thereof. In some examples, a compound image may be generated from a source image(s) from a single image sensor, from a single image stream, or a combination thereof. In some examples, a compound image may provide a tidy layout of a macro view combined with focus cells. Different quantities of focus cells (e.g., 1, 2, 3, 4, 5, 6, etc.) may be utilized in a compound image in some examples. - In some examples, an apparatus may order focus cells along a horizontal dimension. For instance, an apparatus may order (e.g., initially order) focus cells according to an order of regions along a horizontal dimension in a source image. In the example of
FIG. 1 , the focus cells are ordered from left to right as thesecond focus cell 283, the first focus cell 281, the third focus cell 285, and thefourth focus cell 287 according to the left-to-right ordering of thesecond region 172, thefirst region 170, thethird region 174, and the fourth region 176. - In some examples, an apparatus may detect a speaker (e.g., a person that is speaking). For instance, an apparatus may utilize machine learning, a computer vision technique, sound direction detection using signals from a microphone array, voice recognition, or a combination thereof to determine which person is speaking. A focus cell may be utilized to indicate a person that is speaking (even with a mask, for instance) based on the speaker detection. In some examples, an apparatus may produce a speaker indicator to indicate which person is speaking. In the example of
FIG. 2 , aspeaker indicator 289 is illustrated as a box outline corresponding to the first focus cell 281. Examples of a speaker indicator may include a box, framing around a focus cell, a color overlay, a color outline around a speaker, a symbol, highlighted text, animated lines, or a combination thereof. Utilizing focus cells (e.g., a speaker indicator) may indicate changes in speakers. - In some examples, a focus cell(s) in a compound image may be changed based on a detected speaker(s). For example, if a person begins to speak that is not shown in a focus cell in a compound image, an apparatus may generate a focus cell corresponding to the new speaker and add the focus cell to the compound image. In some examples, a focus cell of a new speaker may be added as a rightmost focus cell in a compound image and a leftmost focus cell may be removed from the compound image. For instance, each time a new speaker (e.g., speaker not currently shown in a focus cell) is detected, a leftmost focus cell may be removed from a compound image, another focus cell(s) may be shifted to the left, and a focus cell corresponding to the new speaker may be added at the right. In some examples, a rightmost focus cell may be removed from a compound image, another focus cell(s) may be shifted to the right, and a focus cell corresponding to the new speaker may be added at the left. Adding focus cells in a set pattern may indicate an order in which new speakers occur, which may help to indicate which person is talking to a user viewing a compound image. In some examples, a spatial order of people or regions in the source image may be maintained in the focus cells when a new speaker occurs. For instance, a focus cell of a speaker that spoke longest ago (out of the current set of focus cells, for example) may be removed and a focus cell of a new speaker may be added, where the horizontal spatial order of the people in a source image may be maintained when adding the focus cell of the new speaker (e.g., a focus cell(s) may be shifted, separated, moved together, or a combination thereof to maintain the spatial order).
- In some examples, an apparatus may generate a region indicator(s) in a compound image, where the region indicator(s) correspond to a focus cell(s) currently included in the compound image. In the example of
FIG. 2 , afirst region indicator 271 corresponding to the first focus cell 281, asecond region indicator 273 corresponding to thesecond focus cell 283, athird region indicator 275 corresponding to the third focus cell 285, and afourth region indicator 277 corresponding to thefourth focus cell 287 are provided in themacro view 188 of thecompound image 262. - In some examples, an apparatus may cause display of (e.g., a processor may instruct display of) a compound image including a macro view and a focus cell. For instance, an apparatus may display a compound image on a display panel or may provide the compound image to a display device for display. In the example of
FIG. 2 , thecompound image 262 may be displayed on a display panel or a display device. -
FIG. 3 is a block diagram illustrating an example of anelectronic device 302 that may be used to generate a compound image. An electronic device may be a device that includes electronic circuitry. Examples of theelectronic device 302 may include a computer (e.g., laptop computer), a smartphone, a tablet computer, mobile device, camera, etc. In some examples, theelectronic device 302 may include or may be coupled to aprocessor 304,memory 306,image sensor 310, or a combination thereof. In some examples, components of theelectronic device 302 may be coupled via an interface(s) (e.g., bus(es), wire(s), connector(s), etc.). Theelectronic device 302 may include additional components (not shown) or some of the components described herein may be removed or modified without departing from the scope of this disclosure. In some examples, theelectronic device 302 may include the image sensor 310 (e.g., an integrated camera). In some examples, theelectronic device 302 may be in communication with a separate image sensor (e.g., camera). For instance, an image sensor (e.g., web cam, camera, infrared (IR) sensor, depth sensor, radar, etc.) may be attached to the electronic device and may send an image(s) (e.g., video stream) to theelectronic device 302. In some examples, an image may include visual information, depth information, IR sensing information, or a combination thereof. - In some examples, the
electronic device 302 may include a communication interface(s) (not shown inFIG. 3 ). Theelectronic device 302 may utilize the communication interface(s) to communicate with an external device(s) (e.g., networked device, server, smartphone, microphone, camera, printer, computer, keyboard, mouse, etc.). In some examples, theelectronic device 302 may be in communication with (e.g., coupled to, have a communication link with) a display device(s). In some examples, theelectronic device 302 may include an integrated display panel, touchscreen, button, microphone, or a combination thereof. - In some examples, the communication interface may include hardware, machine-readable instructions, or a combination thereof to enable a component (e.g.,
processor 304,memory 306, etc.) of theelectronic device 302 to communicate with the external device(s). In some examples, the communication interface may enable a wired connection, wireless connection, or a combination thereof to the external device(s). In some examples, the communication interface may include a network interface card, may include hardware, may include machine-readable instructions, or may include a combination thereof to enable theelectronic device 302 to communicate with an input device(s), an output device(s), or a combination thereof. Examples of output devices include a display device(s), speaker(s), headphone(s), etc. Examples of input devices include a keyboard, a mouse, a touchscreen, image sensor, microphone, etc. In some examples, a user may input instructions or data into theelectronic device 302 using an input device(s). - In some examples, the communication interface(s) may include a mobile industry processor interface (MIPI), Universal Serial Bus (USB) interface, or a combination thereof. The
image sensor 310 or a separate image sensor (e.g., webcam) may be utilized to capture asource image 308 and provide thesource image 308 to the electronic device 302 (e.g., to theprocessor 304 or the memory 306). In some examples, the communication interface(s) (e.g., MIPI, USB interface, etc.) may be coupled to theprocessor 304, to thememory 306, or a combination thereof. The communication interface(s) may provide thesource image 308 to theprocessor 304 or thememory 306 from the separate image sensor. - The
image sensor 310 may be a device to sense or capture image information (e.g., an image stream, video stream, etc.). Some examples of theimage sensor 310 may include an optical (e.g., visible spectrum) image sensor, red-green-blue (RGB) sensor, IR sensor, depth sensor, etc., or a combination thereof. For instance, theimage sensor 310 may be a device to capture optical (e.g., visual) image data (e.g., a sequence of video frames). Theimage sensor 310 may capture an image (e.g., series of images, video stream, etc.) of a scene. For instance, theimage sensor 310 may capture video for a video conference, broadcast, recording, etc. In some examples, thesource image 308 may be a frame of a video stream. In some examples, theimage sensor 310 may capture thesource image 308 with a wide-angle field of view (e.g., 120° field of view). - In some examples, the
memory 306 may be an electronic storage device, magnetic storage device, optical storage device, other physical storage device, or a combination thereof that contains or stores electronic information (e.g., instructions, data, or a combination thereof). In some examples, thememory 306 may be, for example, Random Access Memory (RAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), a storage device, an optical disc, the like, or a combination thereof. In some examples, thememory 306 may be volatile or non-volatile memory, such as Dynamic Random Access Memory (DRAM), EEPROM, magnetoresistive random-access memory (MRAM), phase change RAM (PCRAM), memristor, flash memory, the like, or a combination thereof. In some examples, thememory 306 may be a non-transitory tangible machine-readable or computer-readable storage medium, where the term “non-transitory” does not encompass transitory propagating signals. In some examples, thememory 306 may include multiple devices (e.g., a RAM card and a solid-state drive (SSD)). In some examples, thememory 306 may be integrated into theprocessor 304. In some examples, thememory 306 may include (e.g., store) asource image 308,region determination instructions 312,compound image instructions 313, or a combination thereof. - The
processor 304 is logic circuitry. Some examples of theprocessor 304 may include a general-purpose processor, central processing unit (CPU), a graphics processing unit (GPU), a semiconductor-based microprocessor, field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), other hardware device, or a combination thereof suitable for retrieval and execution of instructions stored in thememory 306. In some examples, theprocessor 304 may be an application processor. In some examples, theprocessor 304 may perform one, some, or all of the aspects, operations, elements, etc., described in one, some, or all ofFIG. 1-6 . For instance, theprocessor 304 may process an image(s) (e.g., perform an operation on the source image 308). In some examples, theprocessor 304 may be logic circuitry to perform object detection, object tracking, feature point detection, region determination, region sorting, focus cell generation, macro view generation, etc., or a combination thereof. Theprocessor 304 may execute instructions stored in thememory 306. In some examples, theprocessor 304 may include electronic circuitry that includes electronic components for performing an operation or operations described herein without thememory 306. In some examples, theprocessor 304 may perform one, some, or all of the aspects, operations, elements, etc., described in one, some, or all ofFIG. 1-6 . - In some examples, the
processor 304 may receive a source image 308 (e.g., image sensor stream, video stream, etc.). For instance, theprocessor 304 may receive thesource image 308 from theimage sensor 310. In some examples, theprocessor 304 may receive the source image 308 (e.g., image sensor stream, video stream, etc.) from a separate image sensor. For instance, theprocessor 304 may receive an image stream via a wired or wireless communication interface (e.g., MIPI, USB port, Ethernet port, Bluetooth receiver, etc.). - In some examples, the
processor 304 may execute theregion determination instructions 312 to determine, in thesource image 308, a first region that depicts a first person and a second region that depicts a second person. For example, theprocessor 304 may execute theregion determination instructions 312 to determine regions as described inFIG. 1-2 . In some examples, a machine learning model may be trained to detect a region. A region may indicate a detected object (e.g., face, torso, body, etc.). For instance, a region may be a rectangular region that spans the dimensions of a detected object. In some examples, a machine learning model may be trained using training images that depict an object (e.g., face, torso, body, etc.) for detection. A training image may be labeled with a region located around the object. For instance, theregion determination instructions 312 may include a machine learning model trained to detect the first person, the second person, etc. In some examples, theprocessor 304 may execute theregion determination instructions 312 to determine the first region based on the detected first person (e.g., the first person's face) and to determine the second region based on the detected second person (e.g., the second person's face). - In some examples, the
processor 304 may execute thecompound image instructions 313 to generate a compound image. In some examples, theprocessor 304 may sort the regions to determine a person or set of people that are furthest from theimage sensor 310. The person or set of people that is furthest from theimage sensor 310 may be prioritized (e.g., initially prioritized) for focus cell generation, display, or a combination thereof. For instance, theprocessor 304 may execute thecompound image instructions 313 to determine that the first person is further away than the second person relative to theimage sensor 310 based on the first region and the second region. For example, theprocessor 304 may determine that the first region includes fewer pixels than the second region. Theprocessor 304 may execute thecompound image instructions 313 to generate a focus cell that depicts the first person alone. In some examples, theprocessor 304 may produce multiple focus cells. A focus cell may depict an individually tracked person that is framed in the focus cell. In some examples, focus cells may depict the furthest people by face detection area determination and sorting (e.g., prioritization). - In some examples, the
processor 304 may execute thecompound image instructions 313 to generate a macro view. For instance, theprocessor 304 may generate a macro view as described inFIG. 1-2 . In some examples, the macro view may depict all people in the field of view of thesource image 308. For instance, the macro view may depict all people and an environment in the field of view of theimage sensor 310. The macro view may provide a view of interactions between people (e.g., attendees to a video conference), may reduce a view of distracting content (e.g., crop out areas of thesource image 308 away from the people), or a combination thereof. - In some examples, the
processor 304 may partition thesource image 308 into zones. For instance, theprocessor 304 may partition thesource image 308 into a first zone and a second zone. Theprocessor 304 may determine a first zone region quantity of the first zone and a second zone region quantity of the second zone. Theprocessor 304 may determine a target zone based on the first zone region quantity and the second zone region quantity. For instance, a zone with a greatest quantity of regions may be selected as the target zone. The target zone may be utilized to produce the macro view. For instance, depending on location of a top region point, the macro view may be produced relative to a top region point or a bottom region point as described inFIG. 1-2 . In some examples, theprocessor 304 may generate a macro view of thesource image 308 that depicts the first person and the second person. - In some examples, the
processor 304 may execute thecompound image instructions 313 to generate a compound image. For instance, theprocessor 304 may combine a macro view and a focus cell(s) to produce the compound image. In some examples, the electronic device 302 (e.g., communication interface) may send the compound image to a remote device(s). For instance, the compound image may be sent to a remote device(s) participating in a video conference (e.g., online meeting, video call, etc.). - In some examples, the
processor 304 may execute thecompound image instructions 313 to instruct display of a compound image and the focus cell(s). For instance, theelectronic device 302 may provide the compound image to a display panel or display device for display. In some examples, theelectronic device 302 may display a compound image including the macro view and the first focus cell. -
FIG. 4 is a block diagram illustrating an example of an apparatus 430 for compound image generation. In some examples, the apparatus 430 may perform an aspect or aspects of the operations described inFIG. 1 ,FIG. 2 ,FIG. 3 , or a combination thereof. In some examples, the apparatus 430 may be an example of theelectronic device 302 described inFIG. 3 , or theelectronic device 302 described inFIG. 3 may be an example of the apparatus 430. In some examples, the apparatus 430 may include aprocessor 418 and a communication interface 429. Examples of the apparatus 430 may include a computing device, smartphone, laptop computer, tablet device, mobile device, etc.). In some examples, one, some, or all of the components of the apparatus 430 may be structured in hardware or circuitry. In some examples, the apparatus 430 may perform one, some, or all of the operations described inFIG. 1-6 . - An
image sensor 414 may capture asource image 416. For instance, thesource image 416 may be a frame of a video stream. In some examples, thesource image 416 may depict a scene. In some examples, thesource image 416 may depict people in the scene. For instance, thesource image 416 may depict a first person and a second person. In some examples, theimage sensor 414 may be an example of theimage sensor 310 described inFIG. 3 . Thesource image 416 may be provided to the communication interface 429. - The communication interface 429 may receive the
source image 416 from theimage sensor 414. In some examples, thesource image 416 may be provided to theprocessor 418 from the communication interface 429. In some examples, theprocessor 418 may be an example of theprocessor 304 described inFIG. 3 . In some examples, theprocessor 304 described inFIG. 3 may be an example of theprocessor 418. - The
processor 418 may determine, in thesource image 416, a first region that includes the first person and a second region that includes the second person. For instance, theprocessor 418 may determine a region(s) as described in one, some, or all ofFIG. 1-3 . - The
processor 418 may determine that the first person is further away from theimage sensor 414 than the second person based on a first area of the first region (e.g., first region width x first region height) and a second area of the second region. For instance, theprocessor 418 may sort the regions according to region area, pixel quantity, or a combination thereof, as described in one, some, or all ofFIG. 1-3 to determine that the first person is further away than the second person. - The
processor 418 may determine a horizontal order of the first region and the second region based on a first horizontal position of the first region and a second horizontal position of the second region. The first horizontal position of the first region may be a horizontal coordinate (e.g., x coordinate, leftmost coordinate, center coordinate, etc.) of the first region. The second horizontal position of the second region may be a horizontal coordinate (e.g., x coordinate, leftmost coordinate, center coordinate, etc.) of the second region. For instance, theprocessor 418 may determine a horizontal order of regions (e.g., minimum to maximum horizontal coordinates, or maximum to minimum horizontal coordinates). - The
processor 418 may generate a first focus cell based on the first region and a second focus cell based on the second region. For instance, theprocessor 418 may generate focus cells as described in one, some, or all ofFIG. 1-3 . In some examples, theprocessor 418 may perform an operation(s) on pixel data in the first region to produce the first focus cell, such as scaling, shifting, interpolation, transformation, or a combination thereof. In some examples, theprocessor 418 may perform an operation(s) on pixel data in the second region to produce the second focus cell, such as scaling, shifting, interpolation, transformation, or a combination thereof. - The
processor 418 may generate a macro view of thesource image 416 that depicts the first person and the second person. For instance, theprocessor 418 may generate the macro view as described in one, some, or all ofFIG. 1-3 . In some examples, theprocessor 418 may determine (e.g., select and crop out) a portion of thesource image 416. For instance, theprocessor 418 may determine a top coordinate (e.g., minimum y coordinate) of a top region in thesource image 416 and a bottom coordinate (e.g., maximum y coordinate) of a bottom region in thesource image 416, and may determine whether a difference between the bottom coordinate and the top coordinate is greater than a threshold. In some examples, the threshold may be a fraction of a vertical size (e.g., half of a total height, 1080*0.5=540 pixels, etc.) of thesource image 416. In some examples, theprocessor 418 may generate the macro view from thesource image 416 between the bottom coordinate and the top coordinate in response to determining that the difference is not greater than the threshold. In a case that the difference is greater than the threshold, theprocessor 418 may determine the macro view by determining a target zone and selecting a portion of thesource image 416 based on a top region point associated with the target zone, a bottom region point associated with the target region zone, or a combination thereof. In some examples, theprocessor 418 may perform an operation(s) on the portion to produce the macro view, such as scaling, shifting, interpolation, transformation, or a combination thereof. - The
processor 418 may generate a compound image including the first focus cell and the second focus cell in the horizontal order and the macro view. For instance, theprocessor 418 may combine the macro view and the focus cells to produce the compound image. For instance, theprocessor 418 may generate the compound image including the macro view in the top half of the compound image and the focus cells in the bottom half of the compound image. In some examples, other arrangements may be utilized (e.g., macro view in the bottom half and focus cells in the top half, macro view in a top third and focus cells in a bottom two-thirds, etc.). - In some examples, the apparatus 430 may display the compound image, may send the compound image to another device(s) for display, store the compound image, or a combination thereof. For instance, the apparatus 430 (e.g., communication interface 429) may transmit the compound image to a remote device (e.g., server, computing device, etc.) that is participating in a video conference with the apparatus 430. The remote device may display the compound image on an integrated display panel or provide the compound image to a display device coupled to the remote device for display.
-
FIG. 5 is a flow diagram illustrating an example of amethod 500 for macro view generation. In some examples, themethod 500 or amethod 500 element(s) may be performed by an electronic device or apparatus (e.g.,electronic device 302, apparatus 430, laptop computer, smartphone, tablet device, etc.). For example, themethod 500 may be performed by theelectronic device 302 described inFIG. 3 or the apparatus 430 described inFIG. 4 . - At
step 502, an apparatus may obtain a source image. In some examples, the apparatus may obtain a source image as described in one, some, or all ofFIG. 1-4 . For instance, the apparatus may capture the source image using an integrated image sensor or may receive the source image from a linked image sensor (e.g., connected web cam). - At
step 504, the apparatus determines regions. In some examples, the apparatus may determine regions as described in one, some, or all ofFIG. 1-4 . For instance, the apparatus may perform face or person detection and produce a region for each detected face or person. - At
step 506, the apparatus may determine a top coordinate and a bottom coordinate. In some examples, the apparatus may determine the top coordinate and the bottom coordinate of the regions as described in one, some, or all ofFIG. 1-4 . - At
step 508, the apparatus may partition the source image into zones. In some examples, the apparatus may partition the source image into zones (e.g., four horizontal zones) as described in one, some, or all ofFIG. 1-4 . - At
step 510, the apparatus may determine zone region quantities. In some examples, the apparatus may determine zone region quantities as described in one, some, or all ofFIG. 1-4 . - At
step 512, the apparatus may determine a top region point and a bottom region point. In some examples, the apparatus may determine the top region point and the bottom region point (for a zone(s), for instance) as described in one, some, or all ofFIG. 1-4 . For instance, the apparatus may determine a top region point corresponding to a top coordinate (e.g., minimum y coordinate) of a top region associated with the target region. The apparatus may determine a bottom region point corresponding to a bottom coordinate (e.g., maximum y coordinate) of a bottom region associated with the target region. - At
step 514, the apparatus may determine whether a difference between the bottom coordinate and the top coordinate is greater than a first threshold (e.g., half height of the source image, 1080/2, etc.). In some examples, the apparatus may determine zone region quantities as described in one, some, or all ofFIG. 1-4 . - In a case that the difference is not greater than the first threshold, the apparatus may generate a macro view from the source image between the top coordinate and the bottom coordinate at
step 516. For instance, the macro view may be extracted from (e.g., copied from, cropped from, etc.) the source image between the top coordinate and the bottom coordinate. - In a case that the difference is greater than the first threshold, the apparatus may determine a target zone based on the zone region quantities at
step 518. In some examples, the apparatus may determine the target zone as described in one, some, or all ofFIG. 1-4 . For instance, the apparatus may select a zone that has a greatest quantity of associated regions as the target zone. - At
step 520, the apparatus may determine whether a top region point is greater than a second threshold (e.g., half height of the source image, 1080/2, etc.). In some examples, the apparatus may determine whether the top region point is greater than the second threshold as described in one, some, or all ofFIG. 1-4 . In some examples, the first threshold and the second threshold may be the same quantity or different quantities. - In a case that the top region point is greater than the second threshold, the apparatus may determine a macro view top boundary based on a bottom region point at
step 522. In some examples, the apparatus may determine the macro view top boundary as described in one, some, or all ofFIG. 1-4 . In some examples, the apparatus may determine the macro view top boundary as the bottom region point minus a value (e.g., half height of the source image, 1080/2, second threshold, another value, etc.). - At
step 526, the apparatus may generate a macro view from the source image based on the top boundary. In some examples, the apparatus may generate the macro view based on the top boundary as described in one, some, or all ofFIG. 1-4 . For instance, the apparatus may extract the macro view from (e.g., copy the macro view from, crop the macro view from) the source image from the top boundary with a macro view size (e.g., 1920×540, source image width and half height, etc.). In some examples, the apparatus may extract the macro view from between the top boundary and the bottom region point. - In a case that the top region point is not greater than the second threshold, the apparatus may determine a macro view top boundary based on a top region point at
step 524. In some examples, the apparatus may determine the macro view top boundary as described in one, some, or all ofFIG. 1-4 . For instance, the apparatus may determine the macro view top boundary as the top region point minus a value (e.g., eighth height of the source image, 1080/8, margin size, another value, etc.). Atstep 526, the apparatus may generate a macro view from the source image based on the top boundary. - In some examples, the
method 500 may include generating a compound view based on the macro view (and a focus cell(s), for instance), displaying the compound view, transmitting the compound view, saving the compound view, or a combination thereof. For instance, the apparatus may generate the compound view as described in one, some, or all ofFIG. 1-4 . -
FIG. 6 is a block diagram illustrating an example of a computer-readable medium 650 for compound image generation. The computer-readable medium 650 is a non-transitory, tangible computer-readable medium. In some examples, the computer-readable medium 650 may be, for example, RAM, DRAM, EEPROM, MRAM, PCRAM, a storage device, an optical disc, the like, or a combination thereof. In some examples, the computer-readable medium 650 may be volatile memory, non-volatile memory, or a combination thereof. In some examples, thememory 306 described inFIG. 3 may be an example of the computer-readable medium 650 described inFIG. 6 . - The computer-
readable medium 650 may include data (e.g., information, executable instructions, or a combination thereof). In some examples, the computer-readable medium 650 may include sortinginstructions 652, focus cell generation instructions 654, macroview generation instructions 656,compound image instructions 658, or a combination thereof. - The sorting
instructions 652 may include instructions when executed cause a processor of an electronic device to sort a plurality of regions of a source image to determine that a first person in a first region and a second person in a second region are further from an image sensor. In some examples, sorting a plurality of regions may be performed as described in one, some, or all ofFIG. 1-5 . - The focus cell generation instructions 654 may include instructions when executed cause the processor to generate a first focus cell including the first person alone based on the first region and a second focus cell including the second person alone based on the second region. In some examples, generating focus cells may be performed as described in one, some, or all of
FIG. 1-5 . - The macro
view generation instructions 656 may include instructions when executed cause the processor to generate a macro view of the source image that depicts the first person and the second person. In some examples, generating the macro view may be performed as described in one, some, or all ofFIG. 1-5 . - The
compound image instructions 658 may include instructions when executed cause the processor to generate a compound image including the macro view, the first focus cell, and the second focus cell. In some examples, generating the compound image may be performed as described in one, some, or all ofFIG. 1-5 . In some examples, the macro view occupies half (e.g., top half or bottom half) of the compound image. - In some examples, the
compound image instructions 658 may include instructions when executed cause the processor to remove the first focus cell and add a third focus cell in response to a detection of a third person speaking. For instance, the first focus cell may be removed from the compound image or may be omitted in a subsequent compound image based on a subsequent source image. In some examples, removing a focus cell(s), adding a focus cell(s), or a combination thereof may be performed as described in one, some, or all ofFIG. 1-5 . For instance, the first focus cell may be removed from a left side, the second focus cell may be shifted leftward, and the third focus cell may be added to a right side. - As used herein, items described with the term “or a combination thereof” may mean an item or items. For example, the phrase “A, B, C, or a combination thereof” may mean any of: A (without B and C), B (without A and C), C (without A and B), A and B (without C), B and C (without A), A and C (without B), or all of A, B, and C.
- While various examples are described herein, the described techniques are not limited to the examples. Variations of the examples are within the scope of the disclosure. For example, operation(s), aspect(s), or element(s) of the examples described herein may be omitted or combined.
Claims (20)
1. An electronic device, comprising:
an image sensor to capture a source image; and
a processor to:
determine, in the source image, a first region that depicts a first person and a second region that depicts a second person;
in response to determining that the first person is further away than the second person relative to the image sensor based on the first region and the second region, generate a first focus cell that depicts the first person alone;
generate a macro view of the source image that depicts the first person and the second person, wherein the macro view is generated from the source image between a top coordinate of a top region in the source image and a bottom coordinate of a bottom region in the source image in response to determining that a difference between the top coordinate and the bottom coordinate is greater than a threshold; and
instruct display of a compound image including the macro view and the first focus cell.
2. The electronic device of claim 1 , wherein the source image has a field of view that is greater than sixty degrees along a horizontal dimension.
3. The electronic device of claim 1 , wherein the processor is to determine a first quantity of pixels in the first region and a second quantity of pixels in the second region.
4. The electronic device of claim 3 , wherein the processor is to determine that the first person is further away than the second person when the first quantity of pixels in the first region is less than the second quantity of pixels in the second region.
5. The electronic device of claim 1 , wherein the processor is to order the first focus cell and a second focus cell along a horizontal dimension.
6. The electronic device of claim 1 , wherein the processor is to generate a second focus cell that depicts the second person alone, wherein the compound image includes the first focus cell and the second focus cell.
7. (canceled)
8. The electronic device of claim 1 , wherein the processor is to partition the source image into a first zone and a second zone.
9. The electronic device of claim 8 , wherein the processor is to determine a first zone region quantity of the first zone and a second zone region quantity of the second zone.
10. The electronic device of claim 9 , wherein the processor is to determine a target zone based on the first zone region quantity and the second zone region quantity.
11. The electronic device of claim 1 , wherein the processor is to determine a top region point associated with a zone in the source image, and wherein the processor is to generate the macro view based on the top region point.
12. The electronic device of claim 11 , wherein the processor is to determine a bottom region point associated with the zone in the source image, and wherein the processor is to generate the macro view based on the bottom region point.
13. An apparatus, comprising:
a communication interface to receive a source image from an image sensor, wherein the source image depicts a first person and a second person; and
a processor to:
determine, in the source image, a first region that includes the first person and a second region that includes the second person;
determine that the first person is further away from the image sensor than the second person based on a first area of the first region and a second area of the second region;
determine a horizontal order of the first region and the second region based on a first horizontal position of the first region and a second horizontal position of the second region;
generate a first focus cell based on the first region and a second focus cell based on the second region;
generate a macro view of the source image that depicts the first person and the second person, wherein the macro view is generated from the source image between a top coordinate of a top region in the source image and a bottom coordinate of a bottom region in the source image in response to determining that a difference between the top coordinate and the bottom coordinate is greater than a threshold; and
generate a compound image including the first focus cell and the second focus cell in the horizontal order and the macro view.
14. (canceled)
15. The apparatus of claim 13 , wherein the threshold is a fraction of a vertical size of the source image.
16. The apparatus of claim 13 , wherein the processor is to generate the macro view from the source image between the bottom coordinate and the top coordinate in response to determining that the difference is not greater than the threshold.
17. A non-transitory tangible computer-readable medium comprising instructions when executed cause a processor of an electronic device to:
sort a plurality of regions of a source image to determine that a first person in a first region and a second person in a second region are furthest from an image sensor;
generate a first focus cell including the first person alone based on the first region and a second focus cell including the second person alone based on the second region;
generate a macro view of the source image that depicts the first person and the second person, wherein the macro view is generated from the source image between a top coordinate of a top region in the source image and a bottom coordinate of a bottom region in the source image in response to determining that a difference between the top coordinate and the bottom coordinate is greater than a threshold; and
generate a compound image including the macro view, the first focus cell, and the second focus cell.
18. The non-transitory tangible computer-readable medium of claim 17 , wherein the macro view occupies half of the compound image.
19. The non-transitory tangible computer-readable medium of claim 17 , wherein the instructions when executed cause the processor to remove the first focus cell and add a third focus cell in response to a detection of a third person speaking.
20. The non-transitory tangible computer-readable medium of claim 19 , wherein the first focus cell is removed from a left side, the second focus cell is shifted leftward, and the third focus cell is added to a right side.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/861,713 US11881025B1 (en) | 2022-07-11 | 2022-07-11 | Compound images |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/861,713 US11881025B1 (en) | 2022-07-11 | 2022-07-11 | Compound images |
Publications (2)
Publication Number | Publication Date |
---|---|
US20240013536A1 true US20240013536A1 (en) | 2024-01-11 |
US11881025B1 US11881025B1 (en) | 2024-01-23 |
Family
ID=89431591
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/861,713 Active US11881025B1 (en) | 2022-07-11 | 2022-07-11 | Compound images |
Country Status (1)
Country | Link |
---|---|
US (1) | US11881025B1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9798933B1 (en) * | 2016-12-12 | 2017-10-24 | Logitech Europe, S.A. | Video conferencing system and related methods |
US20230247071A1 (en) * | 2022-01-31 | 2023-08-03 | Zoom Video Communications, Inc. | Concurrent Region Of Interest-Based Video Stream Capture At Normalized Resolutions |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050008240A1 (en) | 2003-05-02 | 2005-01-13 | Ashish Banerji | Stitching of video for continuous presence multipoint video conferencing |
US7034860B2 (en) | 2003-06-20 | 2006-04-25 | Tandberg Telecom As | Method and apparatus for video conferencing having dynamic picture layout |
US20070040900A1 (en) | 2005-07-13 | 2007-02-22 | Polycom, Inc. | System and Method for Configuring Routing of Video from Multiple Sources to Multiple Destinations of Videoconference Using Software Video Switch |
US7932919B2 (en) | 2006-04-21 | 2011-04-26 | Dell Products L.P. | Virtual ring camera |
NO326793B1 (en) * | 2006-12-29 | 2009-02-16 | Tandberg Telecom As | Method and apparatus for displaying close-up images in video conferencing |
JP7248345B2 (en) * | 2019-03-11 | 2023-03-29 | Necソリューションイノベータ株式会社 | Image processing device, image processing method and program |
JP7427408B2 (en) * | 2019-10-07 | 2024-02-05 | シャープ株式会社 | Information processing device, information processing method, and information processing program |
US11748845B2 (en) * | 2021-01-27 | 2023-09-05 | Nvidia Corporation | Machine learning techniques for enhancing video conferencing applications |
US11350029B1 (en) * | 2021-03-29 | 2022-05-31 | Logitech Europe S.A. | Apparatus and method of detecting and displaying video conferencing groups |
US20220400244A1 (en) * | 2021-06-15 | 2022-12-15 | Plantronics, Inc. | Multi-camera automatic framing |
-
2022
- 2022-07-11 US US17/861,713 patent/US11881025B1/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9798933B1 (en) * | 2016-12-12 | 2017-10-24 | Logitech Europe, S.A. | Video conferencing system and related methods |
US20230247071A1 (en) * | 2022-01-31 | 2023-08-03 | Zoom Video Communications, Inc. | Concurrent Region Of Interest-Based Video Stream Capture At Normalized Resolutions |
Also Published As
Publication number | Publication date |
---|---|
US11881025B1 (en) | 2024-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11727577B2 (en) | Video background subtraction using depth | |
US10694146B2 (en) | Video capture systems and methods | |
CN107993216B (en) | Image fusion method and equipment, storage medium and terminal thereof | |
EP3063730B1 (en) | Automated image cropping and sharing | |
US9934823B1 (en) | Direction indicators for panoramic images | |
CN112074865A (en) | Generating and displaying blur in an image | |
CN111448568B (en) | Environment-based application presentation | |
US11854230B2 (en) | Physical keyboard tracking | |
US9824723B1 (en) | Direction indicators for panoramic images | |
US10084970B2 (en) | System and method for automatically generating split screen for a video of a dynamic scene | |
US11496710B2 (en) | Image display method for video conferencing system with wide-angle webcam | |
US20200336656A1 (en) | Systems and methods for real time screen display coordinate and shape detection | |
US10602077B2 (en) | Image processing method and system for eye-gaze correction | |
WO2022260797A1 (en) | Adjusting participant gaze in video conferences | |
TW202320019A (en) | Image modification techniques | |
EP2953351B1 (en) | Method and apparatus for eye-line augmentation during a video conference | |
US20230291993A1 (en) | Adaptive multi-scale face and body detector | |
US11881025B1 (en) | Compound images | |
JP4165571B2 (en) | Image processing apparatus and method, and program | |
US20210304426A1 (en) | Writing/drawing-to-digital asset extractor | |
US11871104B2 (en) | Recommendations for image capture | |
US20230289919A1 (en) | Video stream refinement for dynamic scenes | |
WO2024051289A1 (en) | Image background replacement method and related device | |
CN116582637A (en) | Screen splitting method of video conference picture and related equipment | |
WO2023113948A1 (en) | Immersive video conference system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HUNG, CHIH-CHEN;CHEN, HUNG-MING;CHUANG, CHIA-WEN;SIGNING DATES FROM 20220710 TO 20220711;REEL/FRAME:060475/0498 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |