WO2024043752A1 - Procédé et dispositif électronique d'amélioration d'image basée sur un mouvement - Google Patents

Procédé et dispositif électronique d'amélioration d'image basée sur un mouvement Download PDF

Info

Publication number
WO2024043752A1
WO2024043752A1 PCT/KR2023/012652 KR2023012652W WO2024043752A1 WO 2024043752 A1 WO2024043752 A1 WO 2024043752A1 KR 2023012652 W KR2023012652 W KR 2023012652W WO 2024043752 A1 WO2024043752 A1 WO 2024043752A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
motion
electronic device
regions
key points
Prior art date
Application number
PCT/KR2023/012652
Other languages
English (en)
Inventor
Bindigan Hariprasanna PAWAN PRASAD
Green Rosh K S
Vishakha S R
Original Assignee
Samsung Electronics Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co., Ltd. filed Critical Samsung Electronics Co., Ltd.
Publication of WO2024043752A1 publication Critical patent/WO2024043752A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • G06T5/73
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20201Motion blur correction

Definitions

  • the present invention relates to an electronic device, more specifically related to a method and the electronic device for motion-based image enhancement.
  • the present application is based on and claims priority from an Indian Provisional Application Number 202241048869 filed on 26 th August 2022, the disclosure of which is hereby incorporated by reference herein.
  • Image enhancement has recently gained widespread attention, particularly in consumer markets of smartphones.
  • Leading smartphone vendors have recently made exceptional progress in image enhancement areas such as High Dynamic Range (HDR) and low light de-noising.
  • HDR High Dynamic Range
  • image capturing of moving subjects such as humans often results in artefacts such as blur (1) as well as in the absence of good lighting condition often results in artefacts such as low light noising (2), as illustrated in FIG. 1.
  • Image enhancement via artefact reduction is critical for both aesthetics and downstream computer vision tasks.
  • Multi-frame algorithms such as Multi-Frame Noise Removal (MFNR) and the HDR are commonly used in image enhancement methods.
  • MFNR Multi-Frame Noise Removal
  • the multi-frame algorithms frequently compute motion maps.
  • the motion maps are frequently computed using photometric difference-based methods or human key points-based methods.
  • blur (1)/ low light noising (2)/ ghosts (3) these approaches frequently result in false positive motions.
  • an output image has more noise (2) or a lower dynamic range (4).
  • the photometric difference-based methods use a photometric alignment (optionally for HDR) of each pixel followed by a photometric difference.
  • a photometric alignment (optionally for HDR) of each pixel followed by a photometric difference.
  • the motion map generation is prone to errors.
  • large areas of false positive motion are produced.
  • the large areas of a false positive motion result in less blending of regions, which further results in a loss of dynamic range or an increase in noise, as illustrated in FIG. 2a and FIG. 2b.
  • the human key points-based methods estimate human poses by computing human key points which are then analyzed to detect motion. In the presence of high noise/blur, the estimated human key points are erroneous, which further leads to a classification of static regions as motion (false positive motion). Subsequently, this leads to the lower dynamic range (4) or higher noise (2).
  • the principal object of the embodiments herein is to intelligently generate an image by identifying one or more regions with image artefact(s) (e.g., a blur region, a region with a lot of movement, etc.) from a plurality of regions in received image frame(s) to be enhanced based on a motion characteristic(s) associated with a plurality of estimated key points associated with a subject(s) of the received image frame(s) and an action(s) performed by the subject(s) using the plurality of estimated key points.
  • the enhanced image includes one or more enhanced regions that are free of the image artefacts when compared to the one or more regions from the plurality of regions of the received image frame(s), which enhances user experience.
  • Another object of the embodiment herein is to determine an optimal motion map from a plurality of optimal image frames by predicting a local motion region(s) (e.g., user ⁇ s leg) in the received image frame(s) based on the detected action(s) (e.g., user ⁇ s jump) and the plurality of estimated key points, where the plurality of optimal image frames includes a peak action(s) (e.g., user ⁇ s jump in air) of the detected action(s).
  • the optimal motion map is utilized to generate the enhanced image (e.g. HDR image, de-noised image, blur-corrected image, reflection removed image, etc.).
  • the embodiment herein is to provide a method for motion-based image enhancement.
  • the method includes receiving, by the electronic device, an image frame(s) including a subject(s) performing an action(s). Further, the method includes determining, by the electronic device, the plurality of key points associated with the subject(s) of the received image frame(s). Further, the method includes detecting, by the electronic device, the action(s) performed by the subject(s) using the plurality of estimated key points. Further, the method includes determining, by the electronic device, a motion characteristic(s) associated with the plurality of estimated key points.
  • the method includes identifying, by the electronic device, one or more regions from a plurality of regions in the received image frame(s) to be enhanced based on the determined motion characteristic(s) with the plurality of estimated key points and the detected action(s). Further, the method includes generating, by the electronic device, an enhanced image including the one or more enhanced regions compared to the one or more regions from the plurality of regions of the received image frame(s). Further, the method includes storing, by the electronic device, the enhanced image comprising the one or more enhanced regions of the plurality of regions.
  • identifying, by the electronic device, the one or more regions from the plurality of regions in the received image frame(s) to be enhanced based on the determined motion characteristic(s) with the plurality of estimated key points and the detected action(s) includes determining, by the electronic device, a plurality of optimal image frames from the received image frame(s) based on the detected action(s), where the plurality of optimal image frames includes a peak action(s) of the detected action(s). Further, the method includes predicting, by the electronic device, a local motion region(s) in the received image frame(s) based on the detected action(s).
  • the method includes determining, by the electronic device, an optimal motion map in the plurality of optimal image frames based on the predicted local motion region(s) and the plurality of estimated key points. Further, the method includes performing, by the electronic device, localization of spatial-temporal artefacts for the plurality of optimal image frames based on the determined optimal motion map, the determined motion characteristic(s) associated with the plurality of estimated key points and the detected action(s).
  • the method includes identifying, by the electronic device, the one or more regions from the plurality of regions in the received image frame(s) to be enhanced based on the localization of spatial-temporal artefacts, where the one or more regions includes an image artefact(s), and the image artefact(s) includes a blur region, a noise region, a dark region, and a motion region.
  • determining, by the electronic device, the optimal motion map in the plurality of optimal image frames based on the predicted local motion region(s) and the plurality of estimated key points includes generating, by the electronic device, an initial motion map of the plurality of optimal image frames based on an image restoration mechanism. Further, the method includes generating, by the electronic device, a digital skeleton by connecting the plurality of estimated key points. Further, the method includes retrieving, by the electronic device, a motion probability of key points and bones of the generated digital skeleton from a pre-defined dictionary of a database of the electronic device for the detected action(s). For each action, a probability of motion for each key point is computed.
  • the method includes updating, by the electronic device, the generated digital skeleton based on the retrieved motion probability of key points and bones. Further, the method includes determining, by the electronic device, the optimal motion map based on the predicted local motion region(s), the generated initial motion map and the updated digital skeleton.
  • performing, by the electronic device, localization of spatial-temporal artefacts for the plurality of optimal image frames includes determining, by the electronic device, a standard deviation of noise of the plurality of optimal image frames using a classical learning mechanism and a deep learning mechanism to identify the one or more regions from the plurality of regions in the received image frame(s) to be enhanced.
  • the standard deviation of the image in every region can provide as an estimate of the noise.
  • the method includes determining, by the electronic device, at least one static region from the plurality of regions in the at least one received image frame. Further, the method includes determining, by the electronic device, at least one variation key point in the at least one static region.
  • the method includes determining, by the electronic device, a motion parameter(s) of each key point in the predicted local motion region(s) based on post estimation error and the plurality of estimated key points, where the motion parameter(s) includes a displacement, a velocity, and an acceleration. Further, the method includes determining, by the electronic device, a motion between subsequent frames of the plurality of optimal image frames using the determined motion parameter(s). Further, the method includes determining, by the electronic device, a size of blur-kernel based on the determined motion to identify the one or more regions from the plurality of regions in the received image frame(s) to be enhanced.
  • generating the enhanced image by applying an image enhancement mechanism includes a High Dynamic Range (HDR) image, a de-noised image, a blur-corrected image, and a reflection-removed image.
  • HDR High Dynamic Range
  • generating the HDR image includes clustering, by the electronic device, the identified one or more regions from the plurality of regions in the received image frame(s) and the received image frame(s) into a plurality of frame groups based on the determined motion characteristic(s) associated with the plurality of estimated key points and the detected action(s), where the plurality of frame groups includes a number of frames with a lowest displacement, a number of frames with a medium displacement, and a number of frames with a highest displacement.
  • the method includes generating, by the electronic device, a high exposure frame from the number of frames with the lowest displacement.
  • the method includes generating, by the electronic device, medium exposure frame from the number of frames with the medium displacement.
  • the method includes generating, by the electronic device, a low exposure frame from the number of frames with the highest displacement. Further, the method includes blending, by the electronic device, the generated high exposure frame, the generated medium exposure frame, and the generated low exposure frame to generate the HDR image.
  • generating the blur-corrected image includes determining, by the electronic device, whether the motion parameter(s) exceeds a pre-defined threshold. Blur correction needs to be done only if the motion parameter is above the pre-defined threshold. Further, the method includes applying, by the electronic device, blur correction to regions surrounding the key points whose measured motion parameters exceed the pre-defined threshold. Further, the method includes generating, by the electronic device, the blur-corrected image by applying the blur correction to regions surrounding the key points whose measured motion parameters exceed the pre-defined threshold.
  • generating the reflection removed image includes determining, by the electronic device, a correlation between the determined motion characteristic(s) with the plurality of estimated key points of a first subject with the determined motion characteristic(s) with the plurality of estimated key points of a second subject. Further, the method includes classifying, by the electronic device, a highly correlated key point(s) of the second subject as reflection key points. Further, the method includes generating, by the electronic device, a reflection map using the classified highly correlated key point(s). Further, the method includes generating, by the electronic device, the reflection removed image using the generated reflection map.
  • identifying, by the electronic device, one or more regions from a plurality of regions in the at least one received image frame to be enhanced based on the at least one determined motion characteristic with the plurality of estimated key points and the at least one detected action comprises: comparing, by the electronic device (100), the computed values of the one or more motion characteristics associated with each of the plurality of estimated key points with expected values; determining, by the electronic device (100), a deviation of the computed values of each of the plurality of estimated key points from the expected values; and determining, by the electronic device (100), a first set of key points of the plurality of estimated key points having the deviation greater than a threshold value.
  • the embodiment herein is to provide the electronic device for motion-based image enhancement.
  • the electronic device includes an image processing controller coupled with a processor and a memory.
  • the image processing controller receives the image frame(s) including the subject(s) performing the action(s).
  • the image processing controller determines the plurality of key points associated with the subject(s) of the received image frame(s).
  • the image processing controller detects the action(s) performed by the subject(s) using the plurality of estimated key points.
  • the image processing controller determines the motion characteristic(s) associated with the plurality of estimated key points.
  • the image processing controller identifies the one or more regions from the plurality of regions in the received image frame(s) to be enhanced based on the determined motion characteristic(s) with the plurality of estimated key points and the detected action(s).
  • the image processing controller generates the enhanced image including the one or more enhanced regions compared to the one or more regions from the plurality of regions of the received image frame(s).
  • the image processing controller stores the enhanced image comprising the
  • the electronic device may obtain an enhanced image.
  • FIG. 1 illustrates a problem in a conventional image enhancement mechanism caused by presence of moving subjects, according to a prior art
  • FIG. 2a and Fig. 2b are an example scenario illustrating a problem in an existing HDR image enhancement mechanism, according to the prior art
  • FIG. 3 illustrates a block diagram of an electronic device for motion-based image enhancement, according to an embodiment as disclosed herein;
  • FIG. 4 is a flow diagram illustrating a method for the motion-based image enhancement, according to an embodiment as disclosed herein;
  • FIG. 5 is a system flow diagram illustrating the method for the motion-based image enhancement, according to an embodiment as disclosed herein;
  • FIG. 6 illustrates various operations associated with an action-based artefact region localizer for the motion-based image enhancement, according to an embodiment as disclosed herein;
  • FIG. 7 illustrates various operations associated with a peak action identifier and a local motion predictor for the motion-based image enhancement, according to an embodiment as disclosed herein;
  • FIG. 8 illustrates various operations associated with a region identifier for motion localizer for the motion-based image enhancement, according to an embodiment as disclosed herein;
  • FIG. 9 illustrates various operations associated with a spatial-temporal artefacts localizer for the motion-based image enhancement, according to an embodiment as disclosed herein;
  • FIG. 10 illustrates various operations associated with an image enhancer to generate an HDR image, according to an embodiment as disclosed herein;
  • FIG. 11 is a flow diagram illustrating a method for generating a blur-corrected image using the image enhancer, according to an embodiment as disclosed herein;
  • FIG. 12 illustrates various operations associated with the image enhancer to generate a de-noised image, according to an embodiment as disclosed herein;
  • FIG. 13a and FIG. 13b are an example flow diagram illustrating the method for the motion-based image enhancement, according to an embodiment as disclosed herein.
  • circuits may, for example, be embodied in one or more semiconductor chips, or on substrate supports such as printed circuit boards and the like.
  • circuits constituting a block may be implemented by dedicated hardware, or by a processor (e.g., one or more programmed microprocessors and associated circuitry), or by a combination of dedicated hardware to perform some functions of the block and a processor to perform other functions of the block.
  • a processor e.g., one or more programmed microprocessors and associated circuitry
  • Each block of the embodiments may be physically separated into two or more interacting and discrete blocks without departing from the scope of the disclosure.
  • the blocks of the embodiments may be physically combined into more complex blocks without departing from the scope of the disclosure.
  • FIG. 2a and FIG. 2b are an example scenario illustrating a problem in an existing HDR image enhancement mechanism, according to prior art.
  • the existing HDR image enhancement mechanism receives a plurality of image frames (5 and 6) including a subject (e.g. human) performing an action(s) (e.g., jump).
  • the existing HDR image enhancement mechanism then performs an exposure alignment (7 and 8) on the received plurality of image frames (5 and 6).
  • the existing HDR image enhancement mechanism determines a photometric difference of the exposure alignment (7 and 8) frames.
  • the existing HDR image enhancement mechanism then generates an initial motion map (10).
  • the generated initial motion map (10) is prone to errors. As a result, large areas of false positive motion are produced (11).
  • the large areas of the false positive result in less blending of these regions, resulting in a loss of dynamic range/an increase in noise/dark artefacts, which have a negative impact on user experience.
  • a novel method is proposed for image enhancement that uses action recognition to localise motion regions, which is resistant to the artefacts such as noise and blur.
  • the embodiment herein is to provide a method for motion-based image enhancement.
  • the method includes receiving, by the electronic device, an image frame(s) including a subject(s) performing an action(s). Further, the method includes determining, by the electronic device, the plurality of key points associated with the subject(s) of the received image frame(s). Further, the method includes detecting, by the electronic device, the action(s) performed by the subject(s) using the plurality of estimated key points. Further, the method includes determining, by the electronic device, a motion characteristic(s) associated with the plurality of estimated key points.
  • the method includes identifying, by the electronic device, one or more regions from a plurality of regions in the received image frame(s) to be enhanced based on the determined motion characteristic(s) with the plurality of estimated key points and the detected action(s). Further, the method includes generating, by the electronic device, an enhanced image including the one or more enhanced regions compared to the one or more regions from the plurality of regions of the received image frame(s). Further, the method includes storing, by the electronic device, the enhanced image comprising the one or more enhanced regions of the plurality of regions.
  • the embodiment herein is to provide the electronic device for motion-based image enhancement.
  • the electronic device includes an image processing controller coupled with a processor and a memory.
  • the image processing controller receives the image frame(s) including the subject(s) performing the action(s).
  • the image processing controller determines the plurality of key points associated with the subject(s) of the received image frame(s).
  • the image processing controller detects the action(s) performed by the subject(s) using the plurality of estimated key points.
  • the image processing controller determines the motion characteristic(s) associated with the plurality of estimated key points.
  • the image processing controller identifies the one or more regions from the plurality of regions in the received image frame(s) to be enhanced based on the determined motion characteristic(s) with the plurality of estimated key points and the detected action(s).
  • the image processing controller generates the enhanced image including the one or more enhanced regions compared to the one or more regions from the plurality of regions of the received image frame(s).
  • the image processing controller stores the enhanced image comprising the
  • the proposed method enables the electronic device to intelligently generate the image by identifying one or more regions with image artefact(s) (e.g., blur region, region with lot of movement, etc.) from a plurality of regions in received image frame(s) to be enhanced based on the motion characteristic(s) associated with the plurality of estimated key points associated with the subject(s) of the received image frame(s) and the action(s) performed by the subject(s) using the plurality of estimated key points.
  • the enhanced image includes one or more enhanced regions that are free of the image artefacts when compared to the one or more regions from the plurality of regions of the received image frame(s), which enhances user experience.
  • the proposed method enables the electronic device to determine an optimal motion map in a plurality of optimal image frames by predicting a local motion region(s) (e.g., user ⁇ s leg) in the received image frame(s) based on the detected action(s) (e.g., user ⁇ s jump) and the plurality of estimated key points, where the plurality of optimal image frames includes a peak action(s) (e.g., user ⁇ s jump in air) of the detected action(s).
  • the optimal motion map is utilized to generate the enhanced image (e.g. HDR image, de-noised image, blur-corrected image, reflection removed image, etc.).
  • FIGS. 3 through 13 where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments.
  • FIG. 3 illustrates a block diagram of an electronic device (100) for motion-based image enhancement, according to an embodiment as disclosed herein.
  • the electronic device (100) can be, for example, but is not limited to a smart phone, a laptop, a desktop, a smart watch, a smart TV, an Augmented Reality device (AR device), a Virtual Reality device (VR device), Internet of Things (IoT) device or a like.
  • AR device Augmented Reality device
  • VR device Virtual Reality device
  • IoT Internet of Things
  • the electronic device (100) includes a memory (110), a processor (120), a communicator (130), a display (140), an image processing controller (150), and a camera (160).
  • the memory (110) stores a plurality of image frames with a subject(s), a plurality of key points associated with the subject(s) in a key point motion repository (111) of the memory (110), information associated with bone motion in a bone motion repository (112) of the memory (110), an action(s) performed by the subject(s), a plurality of optimal image frames, an optimal motion map in the plurality of optimal image frames, one or more regions with image artefact(s), and an enhanced image(s) with one or more enhanced regions of the plurality of regions.
  • the memory (110) stores instructions to be executed by the processor (120).
  • the memory (110) may include non-volatile storage elements.
  • non-volatile storage elements may include magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories.
  • the memory (110) may, in some examples, be considered a non-transitory storage medium.
  • the term “non-transitory” may indicate that the storage medium is not embodied in a carrier wave or a propagated signal. However, the term “non-transitory” should not be interpreted that the memory (110) is non-movable.
  • the memory (110) can be configured to store larger amounts of information than the memory.
  • a non-transitory storage medium may store data that can, over time, change (e.g., in Random Access Memory (RAM) or cache).
  • the memory (110) can be an internal storage unit or it can be an external storage unit of the electronic device (100), a cloud storage, or any other type of external storage.
  • the processor (120) communicates with the memory (110), the communicator (130), the display (140), the image processing controller (150), and the camera (160).
  • the camera (160) includes a primary camera (160a) and a secondary camera (160b-160n) to capture the image frame(s).
  • the processor (120) is configured to execute instructions stored in the memory (110) and to perform various processes.
  • the processor (120) may include one or a plurality of processors, maybe a general-purpose processor, such as a Central Processing Unit (CPU), an Application Processor (AP), or the like, a Graphics-only Processing Unit such as a graphics processing unit (GPU), a Visual Processing Unit (VPU), and/or an Artificial Intelligence (AI) dedicated processor such as a Neural Processing Unit (NPU).
  • a general-purpose processor such as a Central Processing Unit (CPU), an Application Processor (AP), or the like
  • a Graphics-only Processing Unit such as a graphics processing unit (GPU), a Visual Processing Unit (VPU), and/or an Artificial Intelligence (AI) dedicated processor such as a Neural Processing Unit (NPU).
  • CPU Central Processing Unit
  • AP Application Processor
  • GPU Graphics-only Processing Unit
  • GPU graphics processing unit
  • VPU Visual Processing Unit
  • AI Artificial Intelligence
  • NPU Neural Processing Unit
  • the communicator (130) is configured for communicating internally between internal hardware components and with external devices (e.g. server) via one or more networks (e.g. Radio technology).
  • the communicator (130) includes an electronic circuit specific to a standard that enables wired or wireless communication.
  • the display (140) can be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED), an Organic Light-Emitting Diode (OLED), or another type of display that can also accept user inputs. Touch, swipe, drag, gesture, voice command, and other user inputs are examples of user inputs.
  • LCD Liquid Crystal Display
  • LED Light Emitting Diode
  • OLED Organic Light-Emitting Diode
  • Touch, swipe, drag, gesture, voice command, and other user inputs are examples of user inputs.
  • the image processing controller (150) is implemented by processing circuitry such as logic gates, integrated circuits, microprocessors, microcontrollers, memory circuits, passive electronic components, active electronic components, optical components, hardwired circuits, or the like, and may optionally be driven by firmware.
  • the circuits may, for example, be embodied in one or more semiconductor chips, or on substrate supports such as printed circuit boards and the like.
  • the image processing controller (150) includes a pose estimator (151), an action recognizer (152), an action-based artefact region localizer (153), an image enhancer (154), and an Artificial Intelligence (AI) engine (155).
  • a pose estimator 151
  • an action recognizer 152
  • an action-based artefact region localizer 153
  • an image enhancer 154
  • an Artificial Intelligence (AI) engine 155
  • the pose estimator (151) receives the image frame(s) including the subject(s) (e.g., human, plant, animal, etc.) performing the action(s) (e.g., jump).
  • the pose estimator (151) determines the plurality of key points associated with the subject(s) of the received image frame(s).
  • the action recognizer (152) detects the action(s) performed by the subject(s) using the plurality of estimated key points.
  • the action recognizer (152) determines a motion characteristic(s).
  • the motion characteristic associated with the plurality of estimated key points can be for example but not limited to velocity, acceleration, displacement).
  • the action-based artefact region localizer (153) identifies one or more regions from a plurality of regions in the received image frame(s) to be enhanced based on the determined motion characteristic(s) with the plurality of estimated key points and the detected action(s).
  • the action-based artefact region localizer (153) determines a plurality of optimal image frames from the received image frame(s) based on the detected action(s).
  • the plurality of optimal image frames are the image frames which includes a peak action(s) of the detected action(s).
  • the action-based artefact region localizer (153) predicts a local motion region(s) in the received image frame(s) based on the detected action(s).
  • the action-based artefact region localizer (153) determines an optimal motion map in the plurality of optimal image frames based on the predicted local motion region(s) and the plurality of estimated key points.
  • the action-based artefact region localizer (153) performs localization of spatial-temporal artefacts for the plurality of optimal image frames based on the determined optimal motion map, the determined motion characteristic(s) associated with the plurality of estimated key points and the detected action(s).
  • the localization of the spatial-temporal artefacts refers to locating the spatial-temporal artefacts in a specific location within the optimal motion map.
  • the action-based artefact region localizer (153) identifies the one or more regions from the plurality of regions in the received image frame(s) to be enhanced based on the localization of spatial-temporal artefacts, where the one or more regions includes an image artefact(s), and the image artefact(s) includes, for example, a blur region, a noise region, a dark region, and a motion region.
  • the action-based artefact region localizer (153) generates an initial motion map of the plurality of optimal image frames based on an image restoration mechanism.
  • the action-based artefact region localizer (153) generates a digital skeleton by connecting the plurality of estimated key points.
  • the action-based artefact region localizer (153) retrieves a motion probability of key points and bones of the generated digital skeleton from a pre-defined dictionary of a database (e.g., key point motion repository (111), bone motion repository (112), etc.) of the electronic device (100) for the detected action(s).
  • the action-based artefact region localizer (153) updates the generated digital skeleton based on the retrieved motion probability of key points and bones.
  • the action-based artefact region localizer (153) determines the optimal motion map based on the predicted local motion region(s), the generated initial motion map and the updated digital skeleton.
  • the action-based artefact region localizer (153) determines a standard deviation of noise of the plurality of optimal image frames using a classical learning mechanism and a deep learning mechanism to identify the one or more regions from the plurality of regions in the received image frame(s) to be enhanced.
  • the action-based artefact region localizer (153) determines at least one static region from the plurality of regions in the at least one received image frame. Further, the action-based artefact region localizer (153) determines at least one variation key point in the at least one static region. In an action such as standing still, the key points are supposed to be static. However, due to error in initial pose estimation, there would be variations in the key-points estimated.
  • the action-based artefact region localizer (153) determines a motion parameter(s) of each key point in the predicted local motion region(s) based on post-estimation error and the plurality of estimated key points, where the motion parameter(s) includes a displacement, a velocity, and an acceleration.
  • the action-based artefact region localizer (153) determines a motion between subsequent frames of the plurality of optimal image frames using the determined motion parameter(s).
  • the action-based artefact region localizer (153) determines a size of blur-kernel based on the determined motion to identify the one or more regions from the plurality of regions in the received image frame(s) to be enhanced.
  • the image enhancer (154) generates the enhanced image including the one or more enhanced regions compared to the one or more regions from the plurality of regions of the received image frame(s).
  • the image enhancer (154) generates the enhanced image by applying an image enhancement mechanism that includes, for example, a High Dynamic Range (HDR) image, a de-noised image, a blur-corrected image, and a reflection-removed image.
  • HDR High Dynamic Range
  • the image enhancer (154) clusters the identified one or more regions from the plurality of regions in the received image frame(s) and the received image frame(s) into a plurality of frame groups based on the determined motion characteristic(s) associated with the plurality of estimated key points and the detected action(s), where the plurality of frame groups includes a number of frames with a lowest displacement, a number of frames with a medium displacement, and a number of frames with a highest displacement.
  • the image enhancer (154) generates a high exposure frame from the number of frames with the lowest displacement.
  • the image enhancer (154) generates a medium exposure frame from the number of frames with the medium displacement.
  • the image enhancer (154) generates a low exposure frame from the number of frames with the highest displacement.
  • the image enhancer (154) blends the generated high exposure frame, the generated medium exposure frame, and the generated low exposure frame to generate the HDR image.
  • the image enhancer (154) generates the de-noised image by utilizing the optimal motion map.
  • the image enhancer (154) determines whether the motion parameter(s) exceeds a pre-defined threshold.
  • the image enhancer (154) applies blur correction to regions surrounding the key points whose measured motion parameters exceed the pre-defined threshold.
  • the image enhancer (154) generates the blur-corrected image by applying the blur correction to regions surrounding the key points whose measured motion parameters exceed the pre-defined threshold.
  • the image enhancer (154) determines a correlation between the determined motion characteristic(s) with the plurality of estimated key points of a first subject with the determined motion characteristic(s) with the plurality of estimated key points of a second subject.
  • the image enhancer (154) classifies a highly correlated key point(s) of the second subject as reflection key points.
  • the image enhancer (154) generates a reflection map using the classified highly correlated key point(s).
  • the image enhancer (154) generates the reflection-removed image using the generated reflection map.
  • the image enhancer (154) compares the computed values of the one or more motion characteristics associated with each of the plurality of estimated key points with expected values, where the expected values are pre-computed. Further, the image enhancer (154) determines a deviation of the computed values of each of the plurality of estimated key points from the expected values. Further, the image enhancer (154) determines a first set of key points of the plurality of estimated key points having the deviation greater than a threshold value. The first set of key points are initial key points computed using an existing method.
  • a function associated with the AI engine (155) may be performed through the non-volatile memory, the volatile memory, and the processor (120).
  • One or a plurality of processors controls the processing of the input data in accordance with a predefined operating rule or AI model stored in the non-volatile memory and the volatile memory.
  • the predefined operating rule or AI model is provided through training or learning.
  • being provided through learning means that, by applying a learning algorithm to a plurality of learning data, a predefined operating rule or AI engine (155) of the desired characteristic is made.
  • the learning may be performed in a device itself in which AI according to an embodiment is performed, and/or may be implemented through a separate server/system.
  • the learning algorithm is a method for training a predetermined target device (for example, a robot) using a plurality of learning data to cause, allow, or control the target device to decide or predict.
  • Examples of learning algorithms include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.
  • the AI engine (155) may consist of a plurality of neural network layers. Each layer has a plurality of weight values and performs a layer operation through a calculation of a previous layer and an operation of a plurality of weights.
  • Examples of neural networks include, but are not limited to, Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Bidirectional Recurrent Deep Neural Network (BRDNN), Generative Adversarial Networks (GAN), and Deep Q-Networks.
  • the processor (120) may include the image processing controller (150).
  • the image processing controller (150) is configured to receive the image frame(s) including the subject(s) performing the action(s).
  • the image processing controller (150) is configured to determine the plurality of key points associated with the subject(s) of the received image frame(s).
  • the image processing controller (150) is configured to detect the action(s) performed by the subject(s) using the plurality of estimated key points.
  • the image processing controller (150) is configured to determine the motion characteristic(s) associated with the plurality of estimated key points.
  • the image processing controller (150) is configured to identify the one or more regions from the plurality of regions in the received image frame(s) to be enhanced based on the determined motion characteristic(s) with the plurality of estimated key points and the detected action(s).
  • the image processing controller (150) is configured to generate the enhanced image including the one or more enhanced regions compared to the one or more regions from the plurality of regions of the received image frame(s).
  • the image processing controller (150) is configured to store the enhanced image including the one or more enhanced regions of the plurality of regions.
  • the image processing controller (150) is configured to receive the image frame(s) including the subject(s) performing the action(s).
  • the image processing controller (150) is configured to estimate a pose of the subject(s) (e.g., human body) by using the AI engine (155) (deep neural network).
  • the subject(s) consists of the plurality of key points (e.g., k approx key points) for each part of the subject(s) (e.g., head, wrist, etc.).
  • the plurality of key points generated by the pose estimator (151) can only be approximated.
  • the image processing controller (150) is configured to detect the action(s) (e.g., jumps, squats, throws, etc.) performed by the subject(s) using the plurality of estimated key points by using the AI engine (155) and generates an action label(s) corresponding to the detected action(s).
  • the action(s) e.g., jumps, squats, throws, etc.
  • the image processing controller (150) is configured to determine a type of image artefact(s) and strength of the one or more regions (e.g., M regions) from the plurality of regions in the optimal image frame(s) (e.g., N frames) to be enhanced based on the action label(s), the plurality of key points, and the received image frame(s).
  • a type of image artefact(s) and strength of the one or more regions e.g., M regions
  • the optimal image frame(s) e.g., N frames
  • the image processing controller (150) is configured to determine the motion characteristic(s) associated with the plurality of estimated key points, to identify the one or more regions from the plurality of regions in the received image frame(s) to be enhanced based on the determined motion characteristic(s) with the plurality of estimated key points and the detected action(s) (e.g., a set of N image frames with M regions of artefacts and a (1 ⁇ M) vector denoting the strength of each of these artefacts (or motion contained)), and to generate the enhanced image (best frame) including the one or more enhanced regions compared to the one or more regions from the plurality of regions of the received image frame(s).
  • the image processing controller (150) is configured to minimize the artefacts (image artefact(s)) and generates the enhanced image by utilizing a combination of the received image frame(s).
  • the image processing controller (150) is configured to determine the plurality of optimal image frames (N frames) from the received image frame(s) (601) based on the detected action(s) (action label(s)), where the plurality of optimal image frames includes the peak action(s) of the detected action(s).
  • the image processing controller (150) is configured to identify a peak action(s)/ peak frame for corresponding detected action(s).
  • a jump action for example, the peak frame will be a highest point of the jump.
  • a javelin throw action for example, the peak frame will be a moment the javelin leaves a hand of the user.
  • the image processing controller (150) is configured to predict the local motion region(s) in the received image frame(s) based on the detected action(s).
  • the image processing controller (150) is configured to include a high motion for the detected action(s) using a pre-defined look-up table. For example, Limbs in the jump.
  • the image processing controller (150) is configured to determine the optimal motion map (e.g., (x, y) coordinates of the regions around each key point) in the plurality of optimal image frames (N frames) based on the predicted local motion region(s) and the plurality of estimated key points (e.g., set of key points with probable motion).
  • the image processing controller (150) is configured to perform the localization of spatial-temporal artefacts for the plurality of optimal image frames based on the determined optimal motion map, the determined motion characteristic(s) associated with the plurality of estimated key points and the detected action(s) and to identify the one or more regions from the plurality of regions in the received image frame(s) to be enhanced based on the localization of spatial-temporal artefacts.
  • the image processing controller (150) is configured to determine whether the one or more regions (e.g., M regions and N frames) are corrupted by some artefact (e.g., image artefact) such as noise/blur. the image processing controller (150) is configured to return a strength/counter value/counter action associated with the artefact in response to determining that the one or more regions are corrupted by some artefact. For example, when the image processing controller (150) is configured to detect that one or more regions are corrupted by the artefact (i.e. blur region), the image processing controller (150) is configured to return a kernel size representing the strength of the blur. In another example, when the image processing controller (150) is configured to detect that one or more regions are corrupted by the artefact (i.e. noise), the image processing controller (150) is configured to return a standard deviation of the noise.
  • some artefact e.g., image artefact
  • the image processing controller (150) is configured to return a strength/counter value/
  • the image processing controller (150) is configured to receive the image frame(s) (701) including the subject(s) performing the action(s).
  • the image processing controller (150) is configured to identify the peak action(s)/ the peak frame(s) (702) from the received image frame(s) (701) for corresponding detected action(s) based on the action label(s). In the jump action, for example, the peak frame(s) (702) will be the highest point of the jump.
  • the image processing controller (150) is configured to predict the local motion region(s) (703) in the received image frame(s) (701) based on the detected action(s).
  • the local motion region(s) (703) includes the high motion (e.g., motion associated with legs) for the detected action(s) using the pre-defined look-up table.
  • the image processing controller (150) is configured to generate the initial motion map of the plurality of optimal image frames (801 and 802) based on the image restoration mechanism (e.g., HDR/motion de-blurring).
  • the image processing controller (150) is configured to generate the digital skeleton by connecting the plurality of estimated key points.
  • the image processing controller (150) is configured to retrieve the motion probability of key points and bones of the generated digital skeleton from the pre-defined dictionary of the database (i.e., key point motion repository (111) and bone motion repository (112)) of the electronic device (100) for the detected action(s).
  • the motion probability/values of key points and bones are chosen from a pre-computed Look-Up Table (LUT) for each action.
  • the image processing controller (150) is configured to update the generated digital skeleton based on the retrieved motion probability of key points and bones.
  • the image processing controller (150) is configured to perform a dilation process on the updated digital skeleton.
  • the image processing controller (150) is configured to perform a smoothing process on the dilated digital skeleton (804).
  • the image processing controller (150) is configured to determine the optimal motion map (805) based on the predicted local motion region/the motion probability, the generated initial motion map and the updated/ dilated/ smoothed digital skeleton (804).
  • the image processing controller (150) is configured to be generated by combining values of the initial motion map and the motion probability.
  • the image processing controller (150) is configured to detect that one or more regions are corrupted by the artefact (i.e. noise) in the plurality of optimal image frames (N frames). The image processing controller (150) then determines the standard deviation of noise of the plurality of optimal image frames using the classical learning mechanism and the deep learning mechanism. The image processing controller (150) then returns the strength/counter value/counter action associated with the artefact to image processing controller (150) in response to determining that the one or more regions is corrupted by some artefact (i.e. noise).
  • the artefact i.e. noise
  • the image processing controller (150) is configured to determine the motion parameter(s) (e.g., a displacement, a velocity, and an acceleration) of each key point in the predicted local motion region(s) based on the post-estimation error and the plurality of estimated key points. Where the image processing controller (150) is configured to determine the post-estimation error by analysing a variation of key points in a static region(s) in the plurality of optimal image frames (the previous stage gives an estimate of low/no motion regions which can be used to determine the post estimation error).
  • the motion parameter(s) e.g., a displacement, a velocity, and an acceleration
  • the image processing controller (150) is configured to determine the motion between subsequent frames of the plurality of optimal image frames using the determined motion parameter(s). The image processing controller (150) then determines the size of blur-kernel based on the determined motion to identify the one or more regions from the plurality of regions in the received image frame(s) to be enhanced. The image processing controller (150) then returns the strength/counter value/counteraction associated with the artefact to the image processing controller (150) in response to determining that the one or more regions are corrupted by some artefact (i.e. blur).
  • some artefact i.e. blur
  • the image processing controller (150) is configured to cluster the identified one or more regions from the plurality of regions in the received image frame(s) and the received image frame(s) into a plurality of frame groups based on the determined motion characteristic(s) associated with the plurality of estimated key points and detected action(s).
  • the plurality of frame groups includes the number of frames with the lowest displacement (K1), the number of frames with the medium displacement (K2), and the number of frames with the highest displacement (K3).
  • K1> K2> K3 The relation among displacement is “K1> K2> K3”.
  • the image processing controller (150) is configured to generate the high exposure frame (K4) from the number of frames with the lowest displacement.
  • the image processing controller (150) is configured to generate the medium exposure frame (K5) from the number of frames with the medium displacement.
  • the image processing controller (150) is configured to generate the low exposure frame (K6) from the number of frames with the highest displacement.
  • the frames (K4, K5, and K6) are added using a weighted addition of all the frames. The weighted addition is performed using the motion map to remove ghosts while blending.
  • the image processing controller (150) is configured to blend the generated high exposure frame (K4), the generated medium exposure frame (K5), and the generated low exposure frame (K6) to generate the HDR image (1002).
  • High, low and medium exposure images/frames are created by blending frames on the displacement of key points, to reduce ghosting and HDR. Lower number of frames with large displacement are blended to create low exposure frames and vice-versa.
  • the image processing controller (150) is configured to receive the N image frame and information associated with the artefact measure for the M regions and the N frames.
  • the image processing controller (150) is configured to determine an average blur in each image frame based on a present blur region(s).
  • the image processing controller (150) is configured to sort image frames in ascending order of the average blur.
  • the image processing controller (150) is configured to store the sorted image frames in the memory (110).
  • the image processing controller (150) is configured to retrieve one or more image frames from the sorted image frames.
  • the image processing controller (150) is configured to determine a maximum blur from the retrieved one or more image frames.
  • the image processing controller (150) is configured to determine whether the maximum blur is less than a pre-defined threshold (t). If the maximum blur is less than the pre-set threshold (t), the image enhancer (154) returns the best frame (i th Frame). Otherwise, the sorted image frame (1104-1105) list is checked until this constraint is met.
  • the image processing controller (150) is configured to generate a motion map based on estimated displacement upon receiving the N image frames and the image artefact (motion)/ measure for the M regions in the N frames.
  • the image processing controller (150) represents regions where the motion of the subject(s) is detected.
  • the motion map is typically a greyscale image with values ranging from 0 to 255. The higher a value, a higher confidence of motion in that region.
  • the motion map is generated using artefact measurements for the N frames.
  • the image processing controller (150) is configured to compensate for multi-frame motion noise reduction upon receiving the motion map and generates the de-noised image.
  • the image processing controller (150) is configured to use the motion map to blend the N frames together using a weighted addition. While blending, regions with more motion are given less weightage.
  • the image processing controller (150) is configured to receive the image frame(s) including the subject(s) performing the action(s).
  • the image processing controller (150) is configured to perform the exposure alignment on the received image frame(s) and generating the initial motion map.
  • the image processing controller (150) is configured to determine the plurality of key points (human pose/digital skeleton) associated with the subject(s) of the received image frame(s).
  • the image processing controller (150) is configured to update the determined digital skeleton based on the retrieved motion probability of key points and bones.
  • the image processing controller (150) is configured to generate an intermediate motion map based on the generated initial motion map.
  • the image processing controller (150) is configured to generate the optimal/final motion map based on the intermediate motion map and the updated digital skeleton/ the plurality of key points, the optimal/final motion map is generated by combining values of the initial motion map and the motion probability.
  • FIG. 3 shows various hardware components of the electronic device (100) but it is to be understood that other embodiments are not limited thereon.
  • the electronic device (100) may include less or more number of components.
  • the labels or names of the components are used only for illustrative purpose and does not limit the scope of the invention.
  • One or more components can be combined to perform the same or substantially similar functions for the motion-based image enhancement.
  • FIG. 4 is a flow diagram (400) illustrating a method for the motion-based image enhancement, according to an embodiment as disclosed herein.
  • the electronic device (100) performs various steps for the motion-based image enhancement.
  • the method includes receiving the image frame(s) including the subject(s) performing the action(s).
  • the method includes determining the plurality of key points associated with the subject(s) of the received image frame(s).
  • the method includes detecting the action(s) performed by the subject(s) using the plurality of estimated key points.
  • the method includes determining the motion characteristic(s) associated with the plurality of estimated key points.
  • the method includes identifying the one or more regions from the plurality of regions in the received image frame(s) to be enhanced based on the determined motion characteristic(s) with the plurality of estimated key points and the detected action(s).
  • the method includes generating the enhanced image including the one or more enhanced regions compared to the one or more regions from the plurality of regions of the received image frame(s).
  • the method includes storing the enhanced image including the one or more enhanced regions of the plurality of regions.
  • FIG. 5 is a system flow diagram illustrating the method for the motion-based image enhancement, according to an embodiment as disclosed herein.
  • the pose estimator (151) receives the image frame(s) including the subject(s) performing the action(s).
  • the pose estimator (151) estimates a pose of the subject(s) (e.g., human body) by using the AI engine (155) (deep neural network).
  • the subject(s) consists of the plurality of key points (e.g., k approx key points) for each part of the subject(s) (e.g., head, wrist, etc.).
  • the plurality of key points generated by the pose estimator (151) can only be approximated.
  • the action recognizer (152) detects the action(s) (e.g., jumps, squats, throws, etc.) performed by the subject(s) using the plurality of estimated key points by using the AI engine (155) and generates an action label(s) corresponding to the detected action(s).
  • the action(s) e.g., jumps, squats, throws, etc.
  • the action-based artefact region localizer (153) determines a type of image artefact(s) and strength of the one or more regions (e.g., M regions) from the plurality of regions in the optimal image frame(s) (e.g., N frames) to be enhanced based on the action label(s), the plurality of key points, and the received image frame(s).
  • FIG. 6 shows additional information about the action-based artefact region localizer (153).
  • the image enhancer (154) determines the motion characteristic(s) associated with the plurality of estimated key points, identifies the one or more regions from the plurality of regions in the received image frame(s) to be enhanced based on the determined motion characteristic(s) with the plurality of estimated key points and the detected action(s) (e.g., a set of N image frames with M regions of artefacts and a (1 ⁇ M) vector denoting the strength of each of these artefacts (or motion contained)), and generates the enhanced image (best frame) including the one or more enhanced regions compared to the one or more regions from the plurality of regions of the received image frame(s).
  • the image enhancer (154) minimizes the artefacts (image artefact(s)) and generates the enhanced image by utilizing a combination of the received image frame(s).
  • FIG. 6 illustrates various operations associated with the action-based artefact region localizer (153) for the motion-based image enhancement, according to an embodiment as disclosed herein.
  • the action-based artefact region localizer (153) includes a peak action identifier (153a), a local motion predictor (153b), a region identifier for motion localizer (153c), and a spatial temporal artefacts localizer (153d).
  • the peak action identifier (153a) determines the plurality of optimal image frames (N frames) from the received image frame(s) (601) based on the detected action(s) (action label(s)), where the plurality of optimal image frames includes the peak action(s) of the detected action(s).
  • the peak action identifier (153a) identifies a peak action(s)/ peak frame for corresponding detected action(s). In a jump action, for example, the peak frame will be a highest point of the jump. In a javelin throw action, for example, the peak frame will be a moment the javelin leaves a hand of the user.
  • the local motion predictor (153b) predicts the local motion region(s) in the received image frame(s) based on the detected action(s).
  • the local motion region(s) includes a high motion for the detected action(s) using a pre-defined look-up table. For example, Limbs in the jump.
  • the region identifier for motion localizer (153c) determines the optimal motion map (e.g., (x, y) coordinates of the regions around each key point) in the plurality of optimal image frames (N frames) based on the predicted local motion region(s) and the plurality of estimated key points (e.g., set of key points with probable motion).
  • the spatial-temporal artefacts localizer (153d) performs the localization of spatial-temporal artefacts for the plurality of optimal image frames based on the determined optimal motion map, the determined motion characteristic(s) associated with the plurality of estimated key points and the detected action(s) and identifies the one or more regions from the plurality of regions in the received image frame(s) to be enhanced based on the localization of spatial-temporal artefacts.
  • the spatial-temporal artefacts localizer (153d) determines whether the one or more regions (e.g., M regions and N frames) are corrupted by some artefact (e.g., image artefact) such as noise/blur.
  • the spatial-temporal artefacts localizer (153d) returns a strength/counter value/counter action associated with the artefact in response to determining that the one or more regions are corrupted by some artefact. For example, when the spatial-temporal artefacts localizer (153d) detects that one or more regions are corrupted by the artefact (i.e. blur region), the spatial-temporal artefacts localizer (153d) returns a kernel size representing the strength of the blur.
  • the spatial-temporal artefacts localizer (153d) when the spatial-temporal artefacts localizer (153d) detects that one or more regions are corrupted by the artefact (i.e. noise), the spatial-temporal artefacts localizer (153d) returns a standard deviation of the noise.
  • the artefact i.e. noise
  • FIG. 7 illustrates various operations associated with the peak action identifier (153a) and the local motion predictor (153b) for the motion-based image enhancement, according to an embodiment as disclosed herein.
  • the peak action identifier (153a) receives the image frame(s) (701) including the subject(s) performing the action(s).
  • the peak action identifier (153a) identifies the peak action(s)/ the peak frame(s) (702) from the received image frame(s) (701) for corresponding detected action(s) based on the action label(s). In the jump action, for example, the peak frame(s) (702) will be the highest point of the jump.
  • the local motion predictor (153b) predicts the local motion region(s) (703) in the received image frame(s) (701) based on the detected action(s).
  • the local motion region(s) (703) includes the high motion (e.g., motion associated with legs) for the detected action(s) using the pre-defined look-up table.
  • FIG. 8 illustrates various operations associated with the region identifier for motion localizer (153c) for the motion-based image enhancement, according to an embodiment as disclosed herein,
  • the region identifier for motion localizer (153c) includes an initial motion map creator (153ca), a digital skeleton creator (153cb), a key/bone intensity updater (153cc), a dilate engine (153cd), a Gaussian smoother (153ce), and a final motion map creator (153cf).
  • the initial motion map creator (153ca) generates the initial motion map of the plurality of optimal image frames (801 and 802) based on the image restoration mechanism (e.g., HDR/motion de-blurring).
  • the digital skeleton creator (153cb) generates the digital skeleton by connecting the plurality of estimated key points.
  • the key/bone intensity updater (153cc) retrieves the motion probability of key points and bones of the generated digital skeleton from the pre-defined dictionary of the database (i.e., key point motion repository (111) and bone motion repository (112)) of the electronic device (100) for the detected action(s).
  • the motion probability/values of key points and bones are chosen from a pre-computed Look-Up Table (LUT) for each action.
  • the key/bone intensity updater (153cc) updates the generated digital skeleton based on the retrieved motion probability of key points and bones.
  • the dilate engine (153cd) performs a dilation process on the updated digital skeleton.
  • the Gaussian smoother (153ce) performs a smoothing process on the dilated digital skeleton (804).
  • the final motion map creator (153cf) determines the optimal motion map (805) based on the predicted local motion region/the motion probability, the generated initial motion map and the updated/ dilated/ smoothed digital skeleton (804).
  • the final motion map creator (153cf) was generated by combining values of the initial motion map and the motion probability.
  • FIG. 9 illustrates various operations associated with the spatial-temporal artefacts localizer (153d) for the motion-based image enhancement, according to an embodiment as disclosed herein.
  • the spatial-temporal artefacts localizer (153d) includes a noise analyzer (153da), a motion analyzer (153db), a pose controller (153dc), and a blur kernel (153dd).
  • the noise analyzer (153da) detects that one or more regions are corrupted by the artefact (i.e. noise) in the plurality of optimal image frames (N frames). The noise analyzer (153da) then determines the standard deviation of noise of the plurality of optimal image frames using the classical learning mechanism and the deep learning mechanism. The noise analyzer (153da) then returns the strength/counter value/counter action associated with the artefact to the image enhancer (154) in response to determining that the one or more regions is corrupted by some artefact (i.e. noise).
  • the artefact i.e. noise
  • the motion analyzer (153db) determines the motion parameter(s) (e.g., a displacement, a velocity, and an acceleration) of each key point in the predicted local motion region(s) based on the post-estimation error and the plurality of estimated key points.
  • the pose controller (153dc) determines the post-estimation error by analysing a variation of key points in a static region(s) in the plurality of optimal image frames (the previous stage gives an estimate of low/no motion regions which can be used to determine the post estimation error).
  • the blur kernel (153dd) determines the motion between subsequent frames of the plurality of optimal image frames using the determined motion parameter(s). The blur kernel (153dd) then determines the size of blur-kernel based on the determined motion to identify the one or more regions from the plurality of regions in the received image frame(s) to be enhanced. The blur kernel (153dd) then returns the strength/counter value/counteraction associated with the artefact to the image enhancer (154) in response to determining that the one or more regions are corrupted by some artefact (i.e. blur).
  • FIG. 10 illustrates various operations associated with the image enhancer (154) to generate the HDR image, according to an embodiment as disclosed herein.
  • the image enhancer (154) includes a cluster generator (154a), a motion compensated addition-1 (154b), a motion compensated addition-2 (154c), a motion compensated addition-3 (154d), and a HDR merger (154e).
  • the cluster generator (154a) clusters the identified one or more regions from the plurality of regions in the received image frame(s) and the received image frame(s) into a plurality of frame groups based on the determined motion characteristic(s) associated with the plurality of estimated key points and detected action(s).
  • the plurality of frame groups includes the number of frames with the lowest displacement (K1), the number of frames with the medium displacement (K2), and the number of frames with the highest displacement (K3).
  • K1> K2> K3 The relation among displacement is “K1> K2> K3”.
  • the cluster generator (154a) generates the high exposure frame (K4) from the number of frames with the lowest displacement.
  • the cluster generator (154a) generates the medium exposure frame (K5) from the number of frames with the medium displacement.
  • the cluster generator (154a) generates the low exposure frame (K6) from the number of frames with the highest displacement.
  • the frames (K4, K5, and K6) are added using a weighted addition of all the frames. The weighted addition is performed using the motion map to remove ghosts while blending.
  • the HDR merger (154e) blends the generated high exposure frame (K4), the generated medium exposure frame (K5), and the generated low exposure frame (K6) to generate the HDR image (1002).
  • High, low and medium exposure images/frames are created by blending frames on the displacement of key points, to reduce ghosting and HDR. Lower number of frames with large displacement are blended to create low exposure frames and vice-versa.
  • a comparison (1000) between a conventional HDR image (1001) and a proposed HDR image (1002) is illustrated in FIG. 10. In comparison to the conventional HDR image (1001), the proposed HDR (1002) has no ghosting effect/image artefact.
  • FIG. 11 is a flow diagram (1100) illustrating a method for generating the blur-corrected image using the image enhancer (154), according to an embodiment as disclosed herein.
  • the method includes receiving the N image frame and information associated with the artefact measure for the M regions and the N frames.
  • the method includes determining an average blur in each image frame based on a present blur region(s).
  • the method includes sorting image frames in ascending order of the average blur.
  • the method includes storing the sorted image frames in the memory (110).
  • the method includes retrieving one or more image frames from the sorted image frames.
  • the method includes determining a maximum blur from the retrieved one or more image frames.
  • the method includes determining whether the maximum blur is less than a pre-defined threshold (t). If the maximum blur is less than the pre-set threshold (t), the image enhancer (154) returns the best frame (i th Frame). Otherwise, the sorted image frame (1104-1105) list is checked until this constraint is met.
  • FIG. 12 illustrates various operations associated with the image enhancer (154) to generate the de-noised image, according to an embodiment as disclosed herein.
  • the image enhancer (154) includes a displacement motion mapper (154f) and a noise reduction engine (154g).
  • the displacement motion mapper (154f) generates a motion map based on estimated displacement upon receiving the N image frames and the image artefact (motion)/ measure for the M regions in the N frames.
  • the motion map represents regions where the motion of the subject(s) is detected.
  • the motion map is typically a greyscale image with values ranging from 0 to 255. The higher a value, a higher confidence of motion in that region.
  • the motion map is generated using artefact measurements for the N frames.
  • the noise reduction engine (154g) compensates for multi-frame motion noise reduction upon receiving the motion map and generates the de-noised image.
  • the noise reduction engine (154g) uses the motion map to blend the N frames together using a weighted addition. While blending, regions with more motion are given less weightage. This ensures that there will be no ghosting or blurring in the final output (de-noised image). Furthermore, the noise will be greatly reduced in regions where there is no motion.
  • a comparison (1200) between a conventional de-noised image (1201) and a proposed de-noised image (1202) is illustrated in FIG. 12. In comparison to the conventional de-noised image (1201), the proposed de-noised image (1202) has no noise effect/image artefact.
  • the photometric difference between the images is used in conventional motion map generation methods. Due to the presence of high noise, this can result in large regions with false positive motion regions. As a result, the final output is noisy (1201). While the proposed method detects motion regions accurately, frames with minimal motion can be chosen for motion-compensated noise reduction (1202).
  • FIG. 13a and FIG. 13b are an example flow diagram (1300) illustrating the method for the motion-based image enhancement, according to an embodiment as disclosed herein.
  • the electronic device (100) performs various steps for the motion-based image enhancement.
  • the method includes receiving the image frame(s) including the subject(s) performing the action(s).
  • the method includes performing the exposure alignment on the received image frame(s) and generating the initial motion map.
  • the method includes determining the plurality of key points (human pose/digital skeleton) associated with the subject(s) of the received image frame(s).
  • the method includes updating the determined digital skeleton based on the retrieved motion probability of key points and bones.
  • the method includes generating an intermediate motion map based on the generated initial motion map.
  • the method includes generating the optimal/final motion map based on the intermediate motion map and the updated digital skeleton/ the plurality of key points, the optimal/final motion map is generated by combining values of the initial motion map and the motion probability
  • the embodiments disclosed herein can be implemented using at least one hardware device and performing network management functions to control the elements.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Studio Devices (AREA)

Abstract

En conséquence, le mode de réalisation de la présente invention consiste à fournir un procédé d'amélioration d'image basée sur un mouvement par un dispositif électronique (100). Le procédé consiste à recevoir une pluralité d'images comprenant un ou plusieurs sujets réalisant une ou plusieurs actions. Le procédé consiste à déterminer la pluralité de points clés associés au ou aux sujets de la pluralité d'images et à détecter la ou les actions réalisées par le ou les sujets à l'aide de la pluralité de points clés estimés. Le procédé consiste à déterminer une ou plusieurs caractéristiques de mouvement associées à la pluralité de points clés estimés. Le procédé consiste à identifier une ou plusieurs régions parmi une pluralité de régions dans la pluralité d'images à améliorer sur la base de la ou des caractéristiques de mouvement déterminées avec la pluralité de points clés estimés et la ou les actions détectées. Le procédé consiste à générer une image améliorée comprenant la ou les régions améliorées comparativement à la ou aux régions parmi la pluralité de régions de la ou des images reçues.
PCT/KR2023/012652 2022-08-26 2023-08-25 Procédé et dispositif électronique d'amélioration d'image basée sur un mouvement WO2024043752A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN202241048869 2022-08-26
IN202241048869 2023-07-04

Publications (1)

Publication Number Publication Date
WO2024043752A1 true WO2024043752A1 (fr) 2024-02-29

Family

ID=90014171

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2023/012652 WO2024043752A1 (fr) 2022-08-26 2023-08-25 Procédé et dispositif électronique d'amélioration d'image basée sur un mouvement

Country Status (1)

Country Link
WO (1) WO2024043752A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101433472B1 (ko) * 2012-11-27 2014-08-22 경기대학교 산학협력단 상황 인식 기반의 객체 검출, 인식 및 추적 장치, 방법 및 컴퓨터 판독 가능한 기록 매체
US20170083748A1 (en) * 2015-09-11 2017-03-23 SZ DJI Technology Co., Ltd Systems and methods for detecting and tracking movable objects
US20190045126A1 (en) * 2017-08-03 2019-02-07 Canon Kabushiki Kaisha Image pick-up apparatus and control method
KR102118937B1 (ko) * 2018-12-05 2020-06-04 주식회사 스탠스 3d 데이터서비스장치, 3d 데이터서비스장치의 구동방법 및 컴퓨터 판독가능 기록매체
US20200267300A1 (en) * 2019-02-15 2020-08-20 Samsung Electronics Co., Ltd. System and method for compositing high dynamic range images

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101433472B1 (ko) * 2012-11-27 2014-08-22 경기대학교 산학협력단 상황 인식 기반의 객체 검출, 인식 및 추적 장치, 방법 및 컴퓨터 판독 가능한 기록 매체
US20170083748A1 (en) * 2015-09-11 2017-03-23 SZ DJI Technology Co., Ltd Systems and methods for detecting and tracking movable objects
US20190045126A1 (en) * 2017-08-03 2019-02-07 Canon Kabushiki Kaisha Image pick-up apparatus and control method
KR102118937B1 (ko) * 2018-12-05 2020-06-04 주식회사 스탠스 3d 데이터서비스장치, 3d 데이터서비스장치의 구동방법 및 컴퓨터 판독가능 기록매체
US20200267300A1 (en) * 2019-02-15 2020-08-20 Samsung Electronics Co., Ltd. System and method for compositing high dynamic range images

Similar Documents

Publication Publication Date Title
WO2019098414A1 (fr) Procédé et dispositif d'apprentissage hiérarchique de réseau neuronal basés sur un apprentissage faiblement supervisé
WO2018212494A1 (fr) Procédé et dispositif d'identification d'objets
US7526101B2 (en) Tracking objects in videos with adaptive classifiers
WO2018135881A1 (fr) Gestion de l'intelligence de vision destinée à des dispositifs électroniques
WO2019098449A1 (fr) Appareil lié à une classification de données basée sur un apprentissage de métriques et procédé associé
JP5498454B2 (ja) 追跡装置、追跡方法およびプログラム
JP7172472B2 (ja) ルール生成装置、ルール生成方法及びルール生成プログラム
JP2012230686A (ja) 挙動認識システム
US20200089958A1 (en) Image recognition method and apparatus, electronic device, and readable storage medium
CN112543936B (zh) 用于动作识别的动作结构自注意力图卷积网络模型
WO2021118270A1 (fr) Procédé et dispositif électronique de défloutage d'image floue
WO2020004815A1 (fr) Procédé de détection d'une anomalie dans des données
KR20200010971A (ko) 광학 흐름 추정을 이용한 이동체 검출 장치 및 방법
JP7446060B2 (ja) 情報処理装置、プログラム及び情報処理方法
WO2024043752A1 (fr) Procédé et dispositif électronique d'amélioration d'image basée sur un mouvement
JP2021111279A (ja) ラベルノイズ検出プログラム、ラベルノイズ検出方法及びラベルノイズ検出装置
WO2024041108A1 (fr) Procédé et appareil d'entraînement de modèle de correction d'image, procédé et appareil de correction d'image, et dispositif informatique
CN111382606A (zh) 摔倒检测方法、摔倒检测装置和电子设备
CN112183287A (zh) 一种移动机器人在复杂背景下的人数统计方法
US20220157050A1 (en) Image recognition device, image recognition system, image recognition method, and non-transitry computer-readable recording medium
EP4105893A1 (fr) Mise à jour de modèle de caméra dynamique d'intelligence artificielle
US20230401809A1 (en) Image data augmentation device and method
WO2022165735A1 (fr) Procédé et système de détection d'un objet en mouvement
De Campos et al. A framework for automatic sports video annotation with anomaly detection and transfer learning
WO2021049119A1 (fr) Dispositif d'apprentissage, procédé d'apprentissage et support non transitoire lisible par ordinateur dans lequel est mémorisé un programme d'apprentissage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23857784

Country of ref document: EP

Kind code of ref document: A1