WO2021112406A1 - Electronic apparatus and method for controlling thereof - Google Patents

Electronic apparatus and method for controlling thereof Download PDF

Info

Publication number
WO2021112406A1
WO2021112406A1 PCT/KR2020/014721 KR2020014721W WO2021112406A1 WO 2021112406 A1 WO2021112406 A1 WO 2021112406A1 KR 2020014721 W KR2020014721 W KR 2020014721W WO 2021112406 A1 WO2021112406 A1 WO 2021112406A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
virtual object
electronic apparatus
user body
processor
Prior art date
Application number
PCT/KR2020/014721
Other languages
French (fr)
Inventor
Yongsung Kim
Daehyun Ban
Dongwan LEE
Hongpyo Lee
Lei Zhang
Original Assignee
Samsung Electronics Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co., Ltd. filed Critical Samsung Electronics Co., Ltd.
Priority to EP20897014.5A priority Critical patent/EP4004878A4/en
Publication of WO2021112406A1 publication Critical patent/WO2021112406A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/0304Detection arrangements using opto-electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/041Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/113Recognition of static hand signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/62Semi-transparency

Definitions

  • the disclosure relates to an electronic apparatus and method for controlling thereof. More particularly, the disclosure relates to an electronic apparatus that renders a virtual object on an image captured by a camera, and displays an AR image using the rendered virtual image and the captured image, and a method for controlling thereof.
  • Augmented reality (AR) technology is a technology that superimposes a 3D virtual image on a real image or background and displays it as a single image.
  • the AR technology is being used in various ways in everyday life, such as not only video games but also smartphones, a head-up display (HUD) on windshield of vehicle, or the like.
  • HUD head-up display
  • the AR technology has a problem in that a plurality of cameras must capture a user and a space from various viewpoints for interaction between the user and a virtual object. Also, a high-performance equipment was required to process an image captured by the plurality of cameras in real time.
  • aspects of the disclosure are to address at least the above-mentioned problems and/or advantages and to provide at least the advantages described below. Accordingly, aspects of the disclosure is to provide an electronic apparatus configured to render a virtual object on an image captured by one camera, and display an augmented reality image by using the rendered virtual object and the captured image, and a method of controlling thereof.
  • an electronic apparatus includes a display, a camera configured to capture a rear of the electronic apparatus facing a front of the electronic apparatus in which the display displays an image, and a processor configured to render a virtual object based on the image captured by the camera, based on a user body being detected from the captured image, estimate a plurality of joint coordinates with respect to the detected user body using a pre-trained learning model, generate an augmented reality image using the estimated plurality of joint coordinates, the rendered virtual object, and the captured image, and control the display to display the generated augmented reality image, wherein the processor is configured to identify whether the user body touches the virtual object based on the estimated plurality of joint coordinates, and change a transmittance of the virtual object based on the touch being identified.
  • the processor may be configured to estimate a plurality of joint coordinates corresponding to a finger joint and a palm using the pre-trained learning model based on the detected user body being identified to be a hand.
  • the processor may be configured to render a virtual hand object and the virtual object based on the estimated plurality of joint coordinates.
  • the processor may be configured to change a transmittance of one area of the virtual object corresponding to the touch.
  • the processor may be configured to change a transmittance of the user body and transparently display the user body based on the touch being identified.
  • the processor may be configured to receive depth data of the captured image from the camera, and generate the augmented reality image using the received depth data.
  • the pre-trained learning model may be configured to be trained through a plurality of learning data including hand images by using a convolutional neural network (CNN).
  • CNN convolutional neural network
  • the plurality of learning data may be configured to include a first data in which a 3D coordinate is matched to at least one area of the hand image, and a second data in which the 3D coordinate is not matched to the hand image, and the pre-trained learning model is configured be trained by updating a weight value of the CNN based on the first data and the second data.
  • a method of controlling an electronic apparatus includes capturing a rear of the electronic apparatus facing a front of the electronic apparatus in which a display displays an image, rendering a virtual object based on a captured image, based on a user body being detected from the captured image, estimating a plurality of joint coordinates with respect to the detected user body using a pre-trained learning model, generating an augmented reality image using the estimated plurality of joint coordinates, the rendered virtual object, and the captured image, displaying the generated augmented reality image, identifying whether the user body touches the virtual object based on the estimated plurality of joint coordinates, and changing a transmittance of the virtual object based on the touch being identified.
  • the estimating may include estimating a plurality of joint coordinates corresponding to a finger joint and a palm using the pre-trained learning model based on the detected user body being identified to be a hand.
  • the rendering may include rendering a virtual hand object and the virtual object based on the estimated plurality of joint coordinates.
  • the changing may include changing a transmittance of one area of the virtual object corresponding to the touch.
  • the method may further include changing a transmittance of the user body and transparently displaying the user body based on the touch being identified.
  • the generating may include receiving depth data of the captured image from the camera, and generating the augmented reality image using the received depth data.
  • the pre-trained learning model may be configured to be trained through a plurality of learning data including hand images by using a convolutional neural network (CNN).
  • CNN convolutional neural network
  • the plurality of learning data may be configured to include a first data in which a 3D coordinate is matched to at least one area of the hand image, and a second data in which the 3D coordinate is not matched to the hand image, and wherein the pre-trained learning model is configured be trained by updating a weight value of the CNN based on the first data and the second data.
  • FIG. 1 is a view illustrating an operation of an electronic apparatus according to an embodiment of the disclosure
  • FIG. 2 is a block diagram illustrating a configuration of an electronic apparatus according to an embodiment of the disclosure
  • FIG. 3 is a block diagram illustrating a detailed configuration of an electronic apparatus according to an embodiment of the disclosure.
  • FIG. 4A is a view illustrating estimating process of a plurality of joint coordinates according to an embodiment of the disclosure
  • FIG 4B is a view illustrating estimating process of a plurality of joint coordinates according to an embodiment of the disclosure
  • FIG. 5 is a view illustrating an AR image displayed by an electronic apparatus according to an embodiment of the disclosure.
  • FIG. 6 is a view illustrating a process of rendering a virtual hand object on a user body according to an embodiment of the disclosure
  • FIG. 7 is a view illustrating an event corresponding to an object touch according to an embodiment of the disclosure.
  • FIG. 8 is a view illustrating an object touch according to an embodiment of the disclosure.
  • FIG. 9 is a flowchart illustrating a method of controlling an electronic apparatus according to an embodiment of the disclosure.
  • the disclosure describes components necessary for disclosure of each embodiment of the disclosure, and thus is not limited thereto. Accordingly, some components may be changed or omitted, and other components may be added. In addition, they may be distributed and arranged in different independent devices.
  • FIG. 1 is a view illustrating an operation of an electronic apparatus according to an embodiment of the disclosure.
  • a user wears an electronic apparatus 100 and the user interacts with a virtual object 1 using an Augmented reality (AR) image 11 displayed on the electronic apparatus 100.
  • AR Augmented reality
  • the electronic apparatus 100 is a device including a camera and a display. As shown in FIG. 1, the electronic apparatus 100 may be implemented in a form of wearable augmented reality (AR) glasses that can be worn by a user. Alternatively, as an embodiment, it may be implemented as at least one of a display apparatus, a smartphone, a laptop PC, a laptop computer, a desktop PC, a server, a camera device, and a wearable device including a communication function.
  • AR augmented reality
  • the electronic apparatus 100 may provide the AR image 11 to the user using a display, and the electronic apparatus 100 may capture a rear of the electronic apparatus 100 facing a front of the electronic apparatus 100 in which the display displays an image by using a camera to move the user’s hand 15, thereby interacting the virtual object 1 with the user body 10.
  • the AR image 11 is an image provided by the electronic apparatus 100 through a display, and may display the user body 10 and the virtual object 1.
  • the camera included in the electronic apparatus 100 captures a space where the user exists, and the AR image 11 may provide an object and a surrounding environment that are actually exist to the user through the captured image.
  • a display is disposed on the front of the electronic apparatus 100 to provide the AR image 11 to the user
  • a camera is disposed on the rear of the electronic apparatus 100 to capture the user's surroundings and the user body.
  • the electronic apparatus 100 since the electronic apparatus 100 captures the user's surroundings and the user body depending on a direction of the user's gaze, that is, a first person perspective, and provides an image generated based on the perspective, the electronic apparatus 100 may provide a realistic AR image.
  • the electronic apparatus 100 may include a camera.
  • the electronic apparatus 100 may guarantee real-time performance of image processing and may not require a high-performance device by using a single camera.
  • the electronic apparatus 100 may estimate 3D coordinates of the user body by using a pre-trained learning model even when only a portion of the user body (e.g., hand) is captured.
  • the electronic apparatus 100 may estimate an exact location and motion of the user body despite using a single camera, and based on this, may provide a service capable of interacting with a virtual object to the user.
  • FIG. 2 is a block diagram illustrating a configuration of an electronic apparatus according to an embodiment of the disclosure.
  • the electronic apparatus 100 may include a camera 110, a display 120, and a processor 130.
  • the camera 110 may capture the rear of the electronic apparatus facing the front of the electronic apparatus 100 in which the display 120 displays an image (S210).
  • the camera 110 may capture a space where the user exists and the user body.
  • the camera 110 may be disposed on the rear or side of the electronic apparatus 100 to capture the rear of the electronic apparatus 100.
  • the camera 110 is disposed on the rear or side of the electronic apparatus 100, but the electronic apparatus 100 may be implemented as a wearable glass AR device or a smartphone, etc. as illustrated in FIG. 1.
  • the camera 110 may capture an image with a variable direction depending on the user’s movement and the user’s gaze.
  • the electronic apparatus 100 may be connected to the processor 130 in a wired or wireless communication method.
  • the image captured by the camera 110 may be provided to the user in real time after a series of processing by the processor 130.
  • the image captured by the camera 110 may be used as a basis for generating an AR image by the processor 130 described below.
  • the image captured by the camera 110 may be an RGB image including RGB data.
  • the camera 110 may be a 3D camera capable of acquiring depth data.
  • the processor 130 may acquire the depth data from the image captured by the camera 110, and use the acquired depth data as a basis for generating an AR image.
  • the display 120 may be disposed in front of the electronic apparatus 100.
  • the display 120 may be connected to the processor 130 by wired or wireless, and the display 120 may display various information under the control of the processor 130.
  • the display 120 may display an AR image generated by the processor 130 (S250). Since the display 120 displays the AR image generated based on the image captured by the camera 110 disposed on the rear of the electronic apparatus 100, the AR image displayed by the display 120 may be a first person perspective image.
  • the display 120 may be implemented in a form of a general display such as a Liquid Crystal Display (LCD), a Light Emitting Diode (LED), an Organic Light Emitting Diode (OLED), a Quantum dot Light Emitting Diode (QLED), etc., and according to another embodiment, the display 120 may also be implemented as a transparent display. Specifically, the display 120 is made of a transparent material, and light outside the electronic apparatus 100 may penetrate the display 120 to reach the user, and the user may observe the user body and external environment by penetrating the display 120.
  • LCD Liquid Crystal Display
  • LED Light Emitting Diode
  • OLED Organic Light Emitting Diode
  • QLED Quantum dot Light Emitting Diode
  • the transparent display may be implemented as a transparent liquid crystal display (LCD) type, a transparent thin-film electroluminescent panel (TFEL) type, a transparent Organic Light Emitting Diode (OLED) type, or the like, and may be implemented in a form of displaying by projecting an image on a transparent screen (e.g., head-up display (HUD)).
  • the processor 130 may control the display 120 such that only virtual objects are displayed on the display 120.
  • the processor 130 may control overall operations and functions of the electronic apparatus 100.
  • the processor 130 may render a virtual object based on the image captured by the camera 110, estimate a plurality of the user body detected using the pre-trained learning model when a user body is detected in the captured image, and generate an AR image by using estimated joint coordinates, the rendered virtual object, and the captured image, and control the display 120 to display the generated AR image.
  • the processor 130 may be electrically connected to the camera 110, and may receive data including the image captured by the camera 110 from the camera 110.
  • the processor 130 may render a virtual object based on the image captured by the camera 110 (S220).
  • rendering may refer to generating a second image including a virtual object to correspond to a first image captured by the camera 110.
  • rendering may mean generating a virtual object to correspond to a certain area of the captured image. Since the processor 130 renders a virtual object based on the captured image, the rendered virtual object may include depth information about space.
  • the processor 130 may estimate a plurality of joint coordinates for the detected user body using a pre-trained learning model (S230). Specifically, the processor 130 may estimate the plurality of joint coordinates for the user body using RGB data included in the captured image. First, the processor 130 may detect a user body from the captured image, and the processor 130 may extract RGB data including the user body from the captured image. In addition, the processor 130 may estimate motions, shapes, and predicted coordinates of the user body by inputting the extracted RGB data. According to another embodiment, depth data may be included in an image captured by the camera 110 and estimated coordinates may be further estimated using the depth data.
  • the pre-trained learning model may be a learning model trained through a plurality of learning data including a hand image using a convolutional neural network (CNN).
  • CNN convolutional neural network
  • the processor 130 may generate an AR image using the estimated joint coordinates, the rendered virtual object, and the captured image (S240).
  • the AR image may refer to a third image generated by matching or calibrating the first image captured by the camera and the second image including the virtual object.
  • the processor 130 may control the display 120 to display the generated AR image.
  • the user may interact with the virtual object through the AR image.
  • the processor 130 may check whether the user body touches the virtual object based on the estimated joint coordinates, and perform an event corresponding to the object touch when the object touch is detected.
  • the event may mean changing a transmittance of the virtual object.
  • the processor 130 may change an alpha value of the virtual object included in the generated AR image by a unit of pixel.
  • the processor 130 may change the transmittance of the virtual object (S260). Alternatively, the processor 130 may change only the transmittance of an area of the virtual object corresponding to the touch. In other words, the processor 130 may change the transmittance of the virtual object by changing alpha values of all pixels corresponding to the virtual object or by changing only an alpha value of the pixel in the area of the virtual object.
  • the processor 130 may identify an object touch based on whether the estimated joint coordinates are positioned at coordinates corresponding to the rendered virtual object.
  • the processor 130 may track joint coordinates in real time or at a predetermined time interval through an image captured in real time by the camera 110. As described above with reference to FIG. 1, since only the image captured through one camera is used and only the plurality of joint coordinates for the user body are estimated by using RGB data of the captured image, the processor 130 may perform real-time image processing.
  • FIG. 3 is a block diagram illustrating a detailed configuration of an electronic apparatus according to an embodiment of the disclosure.
  • the electronic apparatus 100 may include the camera 110, the display 120, the processor 130, a communication interface 140, a memory 150, and a sensor 160. Meanwhile, since the camera 110, the display 120, and the processor 130 illustrated in FIG. 3 have been described in FIG. 2, redundant description will be omitted.
  • the communication interface 140 may communicate with an external apparatus (not shown).
  • the communication interface 140 may be connected to an external device through communication via a third device (e.g., a repeater, a hub, an access point, a server, a gateway, etc.).
  • a third device e.g., a repeater, a hub, an access point, a server, a gateway, etc.
  • the communication interface 140 may include various communication modules to perform communication with an external device.
  • the communication interface 140 may include an NFC module, a wireless communication module, an infrared module, and a broadcast receiving module.
  • the communication interface 140 may receive information related to the operation of the electronic apparatus 100 from an external device. According to an embodiment, the communication interface 140 may receive a learning model previously learned from an external server and device using the communication interface 140, and control the communication interface 140 to estimate coordinates of the user body using an external high-performance server and device. Further, the communication interface 140 may be used to update information stored in the electronic apparatus 100.
  • the memory 150 may store a command or data regarding at least one of the other elements of the electronic apparatus 100.
  • the memory 150 may be implemented as a non-volatile memory, a volatile memory, a flash memory, a hard disk drive (HDD) or a solid state drive (SDD).
  • the memory 150 may be accessed by the processor 130, and perform readout, recording, correction, deletion, update, and the like, on data by the processor 130.
  • the term of the storage may include the memory 150, read-only memory (ROM) (not illustrated) and random access memory (RAM) (not illustrated) within the processor 130, and a memory card (not illustrated) attached to the electronic apparatus 100 (e.g., micro secure digital (SD) card or memory stick).
  • the memory 150 may store a program, data, and the like for constituting various types of screens that will be displayed in the display area of the display 120.
  • the memory 150 may store data for displaying the AR image. Specifically, the memory 150 may store an image captured by the camera 110 and a second image including a virtual object generated by the processor 130. Also, the memory 150 may store the AR image generated based on the captured image and the rendered virtual object. Also, the memory 150 may store the plurality of joint coordinates of the user body estimated by the processor 130 in real time.
  • the sensor 160 may detect an object. Specifically, the sensor 160 may sense an object by sensing physical changes such as heat, light, temperature, pressure, sound, or the like. Also, the sensor 160 may output coordinate information about the sensed object. Specifically, the sensor 160 may acquire 3D point information of the sensed object or output coordinate information based on a distance.
  • the sensor 160 may be a lidar sensor, a radar sensor, an infrared sensor, an ultrasonic sensor, a radio frequency (RF) sensor, a depth sensor, and a distance measurement sensor.
  • the sensor 160 is a type of an active sensor and may transmit a specific signal to measure a time of flight (ToF).
  • the ToF is a flight time distance measurement method, and may be a method of measuring a distance by measuring a time difference between a reference time point at which a pulse is fired and a detection time point of a pulse reflected back from a measurement object.
  • FIGS. 4A and 4B are views illustrating estimating process of a plurality of joint coordinates according to an embodiment of the disclosure.
  • a user body 40 a plurality of joint coordinates 41 to 46 corresponding to a finger joint and a palm are illustrated.
  • FIG. 4B a hand image to which a plurality of joint coordinates 41 to 46 of FIG. 4A are connected when the hand is illustrated in a plurality of configurations.
  • the electronic apparatus 100 may use a pre-trained learning model as a method of estimating a plurality of joint coordinates corresponding to a finger joint and a palm.
  • the learning model may be trained through a plurality of learning data or training data including a human hand image using a convolutional neural network (CNN).
  • the learning data may be learned based on data in which 3D coordinates are input to at least one region of the hand image and data in which the 3D coordinates are not matched to the hand image.
  • data input with 3D coordinates in at least one region of the hand image may be learned using a learning model.
  • output data may be obtained from the learning model using data in which the 3D coordinates are not matched to the hand image, and a loss function or error between the output data and the data in which 3D coordinates are input in at least one region of the hand image may be calculated.
  • the learning model may be trained through a process of updating a weight value of the CNN using the calculated loss function or error.
  • the learning data or training data may be a hand image in which 3D coordinates are assigned to a plurality of joints including a fingertip and a finger joint.
  • the learning model may be formed by machine learning or may be learned through deep learning after inputting 21 coordinates, respectively, as shown in FIG. 4A in each of the hand images including various hand shapes and size.
  • the hand image illustrated in FIG. 4B may be learning data or training data for learning the learning model.
  • FIG. 4A shows a user body 40 composed of 20 points corresponding to each finger and one point corresponding to the palm, a total of 21 points, are illustrated.
  • the 21 points included in the user body 40 may be the basis for estimating a plurality of joint coordinates for the user body.
  • the electronic apparatus 100 may more accurately grasp the user body joint coordinates and locations included in the captured image.
  • the number of points included in the user body 40 of FIG. 4A is only an embodiment, and the number of points set according to the performance of the electronic apparatus 100 may be appropriately selected in the range of 5 to 40 during implementation.
  • the plurality of hand images may be learning data or training data for learning the learning model.
  • the electronic apparatus 100 may estimate the joint coordinates of the user by learning the hand image illustrated in FIG. 4B.
  • the electronic apparatus 100 may estimate coordinates of hidden finger with only a part of the captured image even when a part of the hand, for example, fingers are hidden by another object or subject.
  • the learning model learns the data of FIG. 4B and a finger image similar to the data of FIG. 4B is given.
  • RGB and depth data for the fingers hidden by another object or subject may not be obtained, and accurate coordinates for the hidden fingers may not be obtained with only given data.
  • the electronic apparatus 100 according to the disclosure may estimate coordinates using a pre-trained learning model for fingers hidden by other objects or subjects.
  • the above example is an example for convenience of description, and data related to all movements and postures of the hand are not required to estimate the coordinates of the hand according to an embodiment of the disclosure.
  • FIG. 5 is a view illustrating an AR image displayed by an electronic apparatus according to an embodiment of the disclosure.
  • a first image 51 including a user body 50, a second image 52 including a virtual object 5, and a third image 53 including a user body 50 and a virtual object 5 are illustrated.
  • the first image 51 may be an image captured by a camera included in the electronic apparatus 100. Specifically, the rear of the electronic apparatus 100 may by captured by the camera included in the electronic apparatus 100, and the captured image may include the user body 50. The user body 50 captured by the camera is illustrated in the first image 51.
  • the second image 52 may be an image generated by the virtual object 5 rendered by the electronic apparatus 100 based on the captured image so as to correspond to the first image captured by the camera included in the electronic apparatus 100. Since the virtual object 5 rendered by the electronic apparatus 100 is generated based on the captured image, depth information on space may be included.
  • the third image 53 may be an image generated by the first image 51 and the second image 52 being matched and calibrated by the electronic apparatus 100.
  • the electronic apparatus 100 may extract RGB data including the user body from the first image captured by the camera included in the electronic apparatus 100. Then, the electronic apparatus 100 may estimate motions, shapes, and predicted coordinates of the user body by inputting the extracted RGB data. Alternatively, the electronic apparatus 100 may include depth data in the captured image, and estimate a plurality of joint coordinates for the user body by using the depth data.
  • the electronic apparatus 100 may detect a hand of the user body 50 included in the first image 51, and estimate coordinates of the user body 50 detected by using the pre-trained learning model.
  • the electronic apparatus 100 may estimate the coordinates of the user body 50 by additionally using the depth data of the user body 50 from the captured image.
  • the electronic apparatus 100 may match or calibrate the first image 51 and the second image 52 by using the estimated joint coordinates of the user body 50 and the depth information of the virtual object 5. Since the electronic apparatus 100 may generate an augmented reality image using the estimated coordinates of the user body and the coordinates of the virtual object, the user may be able to interact with the virtual object displayed through the display of the electronic apparatus 100.
  • FIG. 6 is a view illustrating a process of rendering a virtual hand object on a user body according to an embodiment of the disclosure.
  • an augmented reality image 61 including a virtual hand object 60 is illustrated.
  • the electronic apparatus 100 may estimate a plurality of joint coordinates for the user body, and render a virtual hand object based on the estimated plurality of joint coordinates.
  • the electronic apparatus 100 may detect the user's hand from the captured image and estimate the plurality of joint coordinates for the detected user's hand.
  • the electronic apparatus 100 may render a virtual hand object at coordinates corresponding to the estimated plurality of joint coordinates, and the electronic apparatus 100 may output a virtual hand object instead of the user body.
  • the electronic apparatus 100 may track movements of the user's hand in real time, and the electronic apparatus 100 superimposes a virtual hand object on the user's hand, so that the user may move the virtual hand object 60 in the augmented reality image 61 just like moving the user’s hand, and interact with the virtual object.
  • the electronic apparatus 100 When the electronic apparatus 100 detects an object touch, at least one of a user body or a virtual object may be transparently displayed.
  • a user body or a virtual object An embodiment in which the electronic apparatus 100 transparently displays a virtual object or a user body will be described with reference to FIGS. 7 and 9.
  • FIG. 7 is a view illustrating an event corresponding to an object touch according to an embodiment of the disclosure.
  • an augmented reality image 71 including a virtual object 7 and a user body 70 is illustrated.
  • the electronic apparatus 100 may identify whether the user body 70 touches the virtual object 7 based on the estimated joint coordinates, and perform an event corresponding to the object touch when the object touch is detected.
  • the event may refer to changing a transmittance of the virtual object 7 or a color of the displayed virtual object 7.
  • the electronic apparatus 100 may change an alpha value of the virtual object 7 included in the generated augmented reality image 71 by a unit of pixel.
  • the electronic apparatus 100 may change the transmittance of the virtual object 7.
  • the electronic apparatus 100 may change only the transmittance of an area of the virtual object 7 corresponding to the touch.
  • the electronic apparatus 100 may change the transmittance of the virtual object by changing the alpha values of all pixels corresponding to the virtual object 7 or by changing only the alpha values of pixels in one region of the virtual object 7.
  • FIG. 8 is a view illustrating an object touch according to an embodiment of the disclosure.
  • an augmented reality image 81 including a virtual object 8 and user bodies 80a and 80b is illustrated.
  • the electronic apparatus 100 may identify whether the user body 80a and 80b touch the virtual object 8 based on the estimated joint coordinates, and perform an event corresponding to the object touch when the object touch is detected.
  • the event may refer to changing a transmittance of the rendered virtual hand object 80b.
  • the electronic apparatus 100 may change an alpha value of the virtual hand object 80b included in the generated augmented reality image by a unit of pixel.
  • the electronic apparatus 100 may change a transmittance of the virtual hand object 80b when a touch of the user bodies 80a and 80b and the virtual object 8 is identified.
  • the electronic apparatus 100 may change only the transmittance of one area of the virtual hand object 80b corresponding to the touch.
  • the virtual object 8 may be rendered by the electronic apparatus 100 and occupy a certain area in space (e.g., an area of a certain size of a cuboid).
  • the electronic apparatus 100 may grasp 3D coordinates of the virtual object 8 and also estimate the 3D coordinates of the user bodies 80a and 80b.
  • the electronic apparatus 100 may identify whether to touch the object by comparing the 3D coordinates of the virtual object 8 and the user's bodies 80a and 80b.
  • the electronic apparatus 100 may identify whether a part of the user's bodies 80a and 80b touches the virtual object 8.
  • the electronic apparatus 100 may change a transmittance of a part area (the virtual hand object 80b) of the user body determined to have generated object touch with the virtual object 8, or reverse the color of the rendered virtual hand object 80b.
  • FIG. 9 is a flowchart illustrating a method of controlling an electronic apparatus according to an embodiment of the disclosure.
  • the electronic apparatus 100 may include a camera, a display, and a processor.
  • the electronic apparatus 100 may capture a rear of the electronic apparatus 100 facing a front of which the display displays an image (S910).
  • the front and the rear are described for convenience of description, and may be first and second surfaces.
  • the electronic apparatus 100 may render a virtual object based on the captured image (S920). Specifically, when an image captured by the camera is referred to as a first image or a first layer, the electronic apparatus 100 may generate a second image or a second layer including a virtual object based on the first image or the first layer.
  • the electronic apparatus 100 may estimate a plurality of joint coordinates for the detected user body using a pre-trained learning model (S930). Specifically, when the user body detected from the captured image is a hand, the electronic apparatus 100 may estimate a plurality of joint coordinates corresponding to the finger joint and the palm by using the pre-trained learning model.
  • the pre-trained learning model may be learned through a plurality of learning data including hand images.
  • the electronic apparatus 100 may generate an augmented reality image using the estimated plurality of joint coordinates, the rendered virtual object, and the captured image (S940).
  • the electronic apparatus 100 may display the generated augmented reality image (S950).
  • the electronic apparatus 100 may identify whether the user body touches the virtual object in the displayed augmented reality image, and perform an event corresponding to the object touch when the object touch is detected. In particular, the electronic apparatus 100 may identify whether the user body touches the virtual object based on the estimated joint coordinates, and change a transmittance of the virtual object when the touch is confirmed (S960).
  • the electronic apparatus 100 may change an alpha value of the virtual object included in the generated augmented reality image by a unit of pixel, and the electronic apparatus 100 may identify an object touch based on whether the estimated joint coordinates are located at coordinates corresponding to the rendered virtual object. Also, the electronic apparatus 100 may change only the transmittance of an area of the virtual object corresponding to the touch. In other words, the electronic apparatus 100 may change the transmittance of the virtual object by changing the alpha value of all pixels corresponding to the virtual object or by changing only the pixel alpha value of the area of the virtual object.
  • module includes units made up of hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic blocks, components, or circuits.
  • a “module” may be an integrally constructed component or a minimum unit or part thereof that performs one or more functions.
  • the module may be composed of an application-specific integrated circuit (ASIC).
  • ASIC application-specific integrated circuit
  • the various example embodiments described above may be implemented as an S/W program including an instruction stored on machine-readable (e.g., computer-readable) storage media.
  • the machine is an apparatus which is capable of calling a stored instruction from the storage medium and operating according to the called instruction, and may include an electronic apparatus (e.g., an electronic apparatus A) according to the above-described example embodiments.
  • the instruction When the instruction is executed by a processor, the processor may perform a function corresponding to the instruction directly or using other components under the control of the processor.
  • the command may include a code generated or executed by a compiler or an interpreter.
  • a machine-readable storage medium may be provided in the form of a non-transitory storage medium.
  • non-transitory only denotes that a storage medium does not include a signal but is tangible, and does not distinguish the case where a data is semi-permanently stored in a storage medium from the case where a data is temporarily stored in a storage medium.
  • the respective components may include a single entity or a plurality of entities, and some of the corresponding sub-components described above may be omitted, or another sub-component may be further added to the various example embodiments.
  • some components e.g., module or program
  • Operations performed by a module, a program module, or other component, according to various embodiments may be sequential, parallel, or both, executed iteratively or heuristically, or at least some operations may be performed in a different order, omitted, or other operations may be added.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Architecture (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

An electronic apparatus is provided. The electronic apparatus includes a display, a camera configured to capture a rear of the electronic apparatus facing a front of the electronic apparatus in which the display displays an image, and a processor configured to render a virtual object based on the image captured by the camera, based on a user body being detected from the captured image, estimate a plurality of joint coordinates with respect to the detected user body using a pre-trained learning model, generate an augmented reality image using the estimated plurality of joint coordinates, the rendered virtual object, and the captured image, and control the display to display the generated augmented reality image, wherein the processor is configured to identify whether the user body touches the virtual object based on the plurality of estimated joint coordinates, and change a transmittance of the virtual object based on the touch being identified.

Description

ELECTRONIC APPARATUS AND METHOD FOR CONTROLLING THEREOF
The disclosure relates to an electronic apparatus and method for controlling thereof. More particularly, the disclosure relates to an electronic apparatus that renders a virtual object on an image captured by a camera, and displays an AR image using the rendered virtual image and the captured image, and a method for controlling thereof.
Augmented reality (AR) technology is a technology that superimposes a 3D virtual image on a real image or background and displays it as a single image. The AR technology is being used in various ways in everyday life, such as not only video games but also smartphones, a head-up display (HUD) on windshield of vehicle, or the like.
However, in the case of the AR technology, an image is output by superimposing a virtual object on an image received by a camera, but there has been a problem in that even when a user's hand is closer to the camera than a virtual object, the virtual object is formed on the user's hand so that the user hand is seen to be farther away from the camera than the virtual object.
In addition, the AR technology has a problem in that a plurality of cameras must capture a user and a space from various viewpoints for interaction between the user and a virtual object. Also, a high-performance equipment was required to process an image captured by the plurality of cameras in real time.
The above information is presented as background information only to assist with an understanding of the disclosure. No determination has been made, and no assertion is made, as to whether any of the above might be applicable as prior art with regard to the disclosure.
Aspects of the disclosure are to address at least the above-mentioned problems and/or advantages and to provide at least the advantages described below. Accordingly, aspects of the disclosure is to provide an electronic apparatus configured to render a virtual object on an image captured by one camera, and display an augmented reality image by using the rendered virtual object and the captured image, and a method of controlling thereof.
Additional aspects will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the presented embodiments.
In accordance with an aspect of the disclosure, an electronic apparatus is provided. The apparatus includes a display, a camera configured to capture a rear of the electronic apparatus facing a front of the electronic apparatus in which the display displays an image, and a processor configured to render a virtual object based on the image captured by the camera, based on a user body being detected from the captured image, estimate a plurality of joint coordinates with respect to the detected user body using a pre-trained learning model, generate an augmented reality image using the estimated plurality of joint coordinates, the rendered virtual object, and the captured image, and control the display to display the generated augmented reality image, wherein the processor is configured to identify whether the user body touches the virtual object based on the estimated plurality of joint coordinates, and change a transmittance of the virtual object based on the touch being identified.
The processor may be configured to estimate a plurality of joint coordinates corresponding to a finger joint and a palm using the pre-trained learning model based on the detected user body being identified to be a hand.
The processor may be configured to render a virtual hand object and the virtual object based on the estimated plurality of joint coordinates.
The processor may be configured to change a transmittance of one area of the virtual object corresponding to the touch.
The processor may be configured to change a transmittance of the user body and transparently display the user body based on the touch being identified.
The processor may be configured to receive depth data of the captured image from the camera, and generate the augmented reality image using the received depth data.
The pre-trained learning model may be configured to be trained through a plurality of learning data including hand images by using a convolutional neural network (CNN).
The plurality of learning data may be configured to include a first data in which a 3D coordinate is matched to at least one area of the hand image, and a second data in which the 3D coordinate is not matched to the hand image, and the pre-trained learning model is configured be trained by updating a weight value of the CNN based on the first data and the second data.
In accordance with another aspect of the disclosure, a method of controlling an electronic apparatus is provided. The method includes capturing a rear of the electronic apparatus facing a front of the electronic apparatus in which a display displays an image, rendering a virtual object based on a captured image, based on a user body being detected from the captured image, estimating a plurality of joint coordinates with respect to the detected user body using a pre-trained learning model, generating an augmented reality image using the estimated plurality of joint coordinates, the rendered virtual object, and the captured image, displaying the generated augmented reality image, identifying whether the user body touches the virtual object based on the estimated plurality of joint coordinates, and changing a transmittance of the virtual object based on the touch being identified.
The estimating may include estimating a plurality of joint coordinates corresponding to a finger joint and a palm using the pre-trained learning model based on the detected user body being identified to be a hand.
The rendering may include rendering a virtual hand object and the virtual object based on the estimated plurality of joint coordinates.
The changing may include changing a transmittance of one area of the virtual object corresponding to the touch.
The method may further include changing a transmittance of the user body and transparently displaying the user body based on the touch being identified.
The generating may include receiving depth data of the captured image from the camera, and generating the augmented reality image using the received depth data.
The pre-trained learning model may be configured to be trained through a plurality of learning data including hand images by using a convolutional neural network (CNN).
The plurality of learning data may be configured to include a first data in which a 3D coordinate is matched to at least one area of the hand image, and a second data in which the 3D coordinate is not matched to the hand image, and wherein the pre-trained learning model is configured be trained by updating a weight value of the CNN based on the first data and the second data.
Other aspects, advantages, and salient features of the disclosure will become apparent to those skilled in the art from the following detailed description, which, taken in conjunction with the annexed drawings, discloses various embodiments of the disclosure.
The above and other aspects, features, and advantages of certain embodiments of the disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a view illustrating an operation of an electronic apparatus according to an embodiment of the disclosure;
FIG. 2 is a block diagram illustrating a configuration of an electronic apparatus according to an embodiment of the disclosure;
FIG. 3 is a block diagram illustrating a detailed configuration of an electronic apparatus according to an embodiment of the disclosure;
FIG. 4A is a view illustrating estimating process of a plurality of joint coordinates according to an embodiment of the disclosure;
FIG 4B is a view illustrating estimating process of a plurality of joint coordinates according to an embodiment of the disclosure;
FIG. 5 is a view illustrating an AR image displayed by an electronic apparatus according to an embodiment of the disclosure;
FIG. 6 is a view illustrating a process of rendering a virtual hand object on a user body according to an embodiment of the disclosure;
FIG. 7 is a view illustrating an event corresponding to an object touch according to an embodiment of the disclosure;
FIG. 8 is a view illustrating an object touch according to an embodiment of the disclosure; and
FIG. 9 is a flowchart illustrating a method of controlling an electronic apparatus according to an embodiment of the disclosure.
The same reference numerals are used to represent the same elements throughout the drawings.
The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of various embodiments of the disclosure as defined by the claims and their equivalents. It includes various specific details to assist in that understanding but these are to be regarded as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the various embodiments described herein can be made without departing from the scope and spirit of the disclosure. In addition, descriptions of well-known functions and constructions may be omitted for clarity and conciseness.
The terms and words used in the following description and claims are not limited to the bibliographical meanings, but, are merely used by the inventor to enable a clear and consistent understanding of the disclosure. Accordingly, it should be apparent to those skilled in the art that the following description of various embodiments of the disclosure is provided for illustration purpose only and not for the purpose of limiting the disclosure as defined by the appended claims and their equivalents.
It is to be understood that the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a component surface” includes reference to one or more of such surfaces.
The terms “have”, “may have”, “include”, and “may include” used in the embodiments of the disclosure indicate the presence of corresponding features (for example, elements such as numerical values, functions, operations, or parts), and do not preclude the presence of additional features.
In addition, the disclosure describes components necessary for disclosure of each embodiment of the disclosure, and thus is not limited thereto. Accordingly, some components may be changed or omitted, and other components may be added. In addition, they may be distributed and arranged in different independent devices.
Hereinafter, the disclosure will be described in more detail with reference to the drawings.
FIG. 1 is a view illustrating an operation of an electronic apparatus according to an embodiment of the disclosure.
Referring to FIG. 1, a user wears an electronic apparatus 100 and the user interacts with a virtual object 1 using an Augmented reality (AR) image 11 displayed on the electronic apparatus 100.
The electronic apparatus 100 is a device including a camera and a display. As shown in FIG. 1, the electronic apparatus 100 may be implemented in a form of wearable augmented reality (AR) glasses that can be worn by a user. Alternatively, as an embodiment, it may be implemented as at least one of a display apparatus, a smartphone, a laptop PC, a laptop computer, a desktop PC, a server, a camera device, and a wearable device including a communication function.
The electronic apparatus 100 may provide the AR image 11 to the user using a display, and the electronic apparatus 100 may capture a rear of the electronic apparatus 100 facing a front of the electronic apparatus 100 in which the display displays an image by using a camera to move the user’s hand 15, thereby interacting the virtual object 1 with the user body 10.
The AR image 11 is an image provided by the electronic apparatus 100 through a display, and may display the user body 10 and the virtual object 1. In addition, the camera included in the electronic apparatus 100 captures a space where the user exists, and the AR image 11 may provide an object and a surrounding environment that are actually exist to the user through the captured image.
A display is disposed on the front of the electronic apparatus 100 to provide the AR image 11 to the user, and a camera is disposed on the rear of the electronic apparatus 100 to capture the user's surroundings and the user body. According to an embodiment of the disclosure, since the electronic apparatus 100 captures the user's surroundings and the user body depending on a direction of the user's gaze, that is, a first person perspective, and provides an image generated based on the perspective, the electronic apparatus 100 may provide a realistic AR image.
According to an embodiment of the disclosure, the electronic apparatus 100 may include a camera. The electronic apparatus 100 may guarantee real-time performance of image processing and may not require a high-performance device by using a single camera. The electronic apparatus 100 may estimate 3D coordinates of the user body by using a pre-trained learning model even when only a portion of the user body (e.g., hand) is captured. In addition, the electronic apparatus 100 may estimate an exact location and motion of the user body despite using a single camera, and based on this, may provide a service capable of interacting with a virtual object to the user.
FIG. 2 is a block diagram illustrating a configuration of an electronic apparatus according to an embodiment of the disclosure.
Referring to FIG. 2, the electronic apparatus 100 may include a camera 110, a display 120, and a processor 130.
The camera 110 may capture the rear of the electronic apparatus facing the front of the electronic apparatus 100 in which the display 120 displays an image (S210). The camera 110 may capture a space where the user exists and the user body. The camera 110 may be disposed on the rear or side of the electronic apparatus 100 to capture the rear of the electronic apparatus 100. The camera 110 is disposed on the rear or side of the electronic apparatus 100, but the electronic apparatus 100 may be implemented as a wearable glass AR device or a smartphone, etc. as illustrated in FIG. 1. Thus, the camera 110 may capture an image with a variable direction depending on the user’s movement and the user’s gaze.
In addition, the electronic apparatus 100 may be connected to the processor 130 in a wired or wireless communication method. The image captured by the camera 110 may be provided to the user in real time after a series of processing by the processor 130. In addition, the image captured by the camera 110 may be used as a basis for generating an AR image by the processor 130 described below. The image captured by the camera 110 may be an RGB image including RGB data. Alternatively, according to another embodiment of the disclosure, the camera 110 may be a 3D camera capable of acquiring depth data. The processor 130 may acquire the depth data from the image captured by the camera 110, and use the acquired depth data as a basis for generating an AR image.
The display 120 may be disposed in front of the electronic apparatus 100. In addition, the display 120 may be connected to the processor 130 by wired or wireless, and the display 120 may display various information under the control of the processor 130. In particular, the display 120 may display an AR image generated by the processor 130 (S250). Since the display 120 displays the AR image generated based on the image captured by the camera 110 disposed on the rear of the electronic apparatus 100, the AR image displayed by the display 120 may be a first person perspective image.
In addition, the display 120 may be implemented in a form of a general display such as a Liquid Crystal Display (LCD), a Light Emitting Diode (LED), an Organic Light Emitting Diode (OLED), a Quantum dot Light Emitting Diode (QLED), etc., and according to another embodiment, the display 120 may also be implemented as a transparent display. Specifically, the display 120 is made of a transparent material, and light outside the electronic apparatus 100 may penetrate the display 120 to reach the user, and the user may observe the user body and external environment by penetrating the display 120. The transparent display may be implemented as a transparent liquid crystal display (LCD) type, a transparent thin-film electroluminescent panel (TFEL) type, a transparent Organic Light Emitting Diode (OLED) type, or the like, and may be implemented in a form of displaying by projecting an image on a transparent screen (e.g., head-up display (HUD)). When the display 120 is implemented as a transparent display, the processor 130 may control the display 120 such that only virtual objects are displayed on the display 120.
The processor 130 may control overall operations and functions of the electronic apparatus 100. In particular, the processor 130 may render a virtual object based on the image captured by the camera 110, estimate a plurality of the user body detected using the pre-trained learning model when a user body is detected in the captured image, and generate an AR image by using estimated joint coordinates, the rendered virtual object, and the captured image, and control the display 120 to display the generated AR image.
The processor 130 may be electrically connected to the camera 110, and may receive data including the image captured by the camera 110 from the camera 110. The processor 130 may render a virtual object based on the image captured by the camera 110 (S220). Specifically, rendering may refer to generating a second image including a virtual object to correspond to a first image captured by the camera 110. In other words, rendering may mean generating a virtual object to correspond to a certain area of the captured image. Since the processor 130 renders a virtual object based on the captured image, the rendered virtual object may include depth information about space.
When the user body is detected from the captured image, the processor 130 may estimate a plurality of joint coordinates for the detected user body using a pre-trained learning model (S230). Specifically, the processor 130 may estimate the plurality of joint coordinates for the user body using RGB data included in the captured image. First, the processor 130 may detect a user body from the captured image, and the processor 130 may extract RGB data including the user body from the captured image. In addition, the processor 130 may estimate motions, shapes, and predicted coordinates of the user body by inputting the extracted RGB data. According to another embodiment, depth data may be included in an image captured by the camera 110 and estimated coordinates may be further estimated using the depth data.
The pre-trained learning model may be a learning model trained through a plurality of learning data including a hand image using a convolutional neural network (CNN). When the user body is a hand, a method of estimating joint coordinates using a learning model will be described below in detail with reference to FIGS. 4A and 4B.
The processor 130 may generate an AR image using the estimated joint coordinates, the rendered virtual object, and the captured image (S240). The AR image may refer to a third image generated by matching or calibrating the first image captured by the camera and the second image including the virtual object.
The processor 130 may control the display 120 to display the generated AR image. When the AR image is displayed on the display 120, the user may interact with the virtual object through the AR image. Specifically, the processor 130 may check whether the user body touches the virtual object based on the estimated joint coordinates, and perform an event corresponding to the object touch when the object touch is detected. The event may mean changing a transmittance of the virtual object. The processor 130 may change an alpha value of the virtual object included in the generated AR image by a unit of pixel.
When a touch of the user body and the virtual object is identified, the processor 130 may change the transmittance of the virtual object (S260). Alternatively, the processor 130 may change only the transmittance of an area of the virtual object corresponding to the touch. In other words, the processor 130 may change the transmittance of the virtual object by changing alpha values of all pixels corresponding to the virtual object or by changing only an alpha value of the pixel in the area of the virtual object.
Also, the processor 130 may identify an object touch based on whether the estimated joint coordinates are positioned at coordinates corresponding to the rendered virtual object. In addition, the processor 130 may track joint coordinates in real time or at a predetermined time interval through an image captured in real time by the camera 110. As described above with reference to FIG. 1, since only the image captured through one camera is used and only the plurality of joint coordinates for the user body are estimated by using RGB data of the captured image, the processor 130 may perform real-time image processing.
FIG. 3 is a block diagram illustrating a detailed configuration of an electronic apparatus according to an embodiment of the disclosure.
Referring to FIG 3, the electronic apparatus 100 may include the camera 110, the display 120, the processor 130, a communication interface 140, a memory 150, and a sensor 160. Meanwhile, since the camera 110, the display 120, and the processor 130 illustrated in FIG. 3 have been described in FIG. 2, redundant description will be omitted.
The communication interface 140 may communicate with an external apparatus (not shown). The communication interface 140 may be connected to an external device through communication via a third device (e.g., a repeater, a hub, an access point, a server, a gateway, etc.).
In addition, the communication interface 140 may include various communication modules to perform communication with an external device. Specifically, the communication interface 140 may include an NFC module, a wireless communication module, an infrared module, and a broadcast receiving module.
The communication interface 140 may receive information related to the operation of the electronic apparatus 100 from an external device. According to an embodiment, the communication interface 140 may receive a learning model previously learned from an external server and device using the communication interface 140, and control the communication interface 140 to estimate coordinates of the user body using an external high-performance server and device. Further, the communication interface 140 may be used to update information stored in the electronic apparatus 100.
The memory 150, for example, may store a command or data regarding at least one of the other elements of the electronic apparatus 100. The memory 150 may be implemented as a non-volatile memory, a volatile memory, a flash memory, a hard disk drive (HDD) or a solid state drive (SDD). The memory 150 may be accessed by the processor 130, and perform readout, recording, correction, deletion, update, and the like, on data by the processor 130. According to an embodiment of the disclosure, the term of the storage may include the memory 150, read-only memory (ROM) (not illustrated) and random access memory (RAM) (not illustrated) within the processor 130, and a memory card (not illustrated) attached to the electronic apparatus 100 (e.g., micro secure digital (SD) card or memory stick). Also, the memory 150 may store a program, data, and the like for constituting various types of screens that will be displayed in the display area of the display 120.
In addition, the memory 150 may store data for displaying the AR image. Specifically, the memory 150 may store an image captured by the camera 110 and a second image including a virtual object generated by the processor 130. Also, the memory 150 may store the AR image generated based on the captured image and the rendered virtual object. Also, the memory 150 may store the plurality of joint coordinates of the user body estimated by the processor 130 in real time.
The sensor 160 may detect an object. Specifically, the sensor 160 may sense an object by sensing physical changes such as heat, light, temperature, pressure, sound, or the like. Also, the sensor 160 may output coordinate information about the sensed object. Specifically, the sensor 160 may acquire 3D point information of the sensed object or output coordinate information based on a distance.
For example, the sensor 160 may be a lidar sensor, a radar sensor, an infrared sensor, an ultrasonic sensor, a radio frequency (RF) sensor, a depth sensor, and a distance measurement sensor. The sensor 160 is a type of an active sensor and may transmit a specific signal to measure a time of flight (ToF). The ToF is a flight time distance measurement method, and may be a method of measuring a distance by measuring a time difference between a reference time point at which a pulse is fired and a detection time point of a pulse reflected back from a measurement object.
FIGS. 4A and 4B are views illustrating estimating process of a plurality of joint coordinates according to an embodiment of the disclosure.
Referring to FIG. 4A, a user body 40, a plurality of joint coordinates 41 to 46 corresponding to a finger joint and a palm are illustrated.
Referring to FIG. 4B, a hand image to which a plurality of joint coordinates 41 to 46 of FIG. 4A are connected when the hand is illustrated in a plurality of configurations.
If the user body detected from an image captured by the camera 110 is a hand, the electronic apparatus 100 may use a pre-trained learning model as a method of estimating a plurality of joint coordinates corresponding to a finger joint and a palm.
The learning model may be trained through a plurality of learning data or training data including a human hand image using a convolutional neural network (CNN). The learning data may be learned based on data in which 3D coordinates are input to at least one region of the hand image and data in which the 3D coordinates are not matched to the hand image. First, data input with 3D coordinates in at least one region of the hand image may be learned using a learning model. Subsequently, output data may be obtained from the learning model using data in which the 3D coordinates are not matched to the hand image, and a loss function or error between the output data and the data in which 3D coordinates are input in at least one region of the hand image may be calculated. The learning model may be trained through a process of updating a weight value of the CNN using the calculated loss function or error.
Referring again to FIG. 4A, the learning data or training data may be a hand image in which 3D coordinates are assigned to a plurality of joints including a fingertip and a finger joint.
Referring again to FIG. 4B, the learning model may be formed by machine learning or may be learned through deep learning after inputting 21 coordinates, respectively, as shown in FIG. 4A in each of the hand images including various hand shapes and size. In other words, the hand image illustrated in FIG. 4B may be learning data or training data for learning the learning model.
FIG. 4A shows a user body 40 composed of 20 points corresponding to each finger and one point corresponding to the palm, a total of 21 points, are illustrated. The 21 points included in the user body 40 may be the basis for estimating a plurality of joint coordinates for the user body.
For example, when the learning model is trained using experimental data or learning data that includes more than 21 points included in the user body 40 of FIG. 4A, the electronic apparatus 100 may more accurately grasp the user body joint coordinates and locations included in the captured image. However, the number of points included in the user body 40 of FIG. 4A is only an embodiment, and the number of points set according to the performance of the electronic apparatus 100 may be appropriately selected in the range of 5 to 40 during implementation.
As shown in FIG. 4B, the plurality of hand images may be learning data or training data for learning the learning model. The electronic apparatus 100 may estimate the joint coordinates of the user by learning the hand image illustrated in FIG. 4B. In particular, the electronic apparatus 100 may estimate coordinates of hidden finger with only a part of the captured image even when a part of the hand, for example, fingers are hidden by another object or subject.
For example, it may be assumed that the learning model learns the data of FIG. 4B and a finger image similar to the data of FIG. 4B is given. In this case, according to the prior art, RGB and depth data for the fingers hidden by another object or subject may not be obtained, and accurate coordinates for the hidden fingers may not be obtained with only given data. However, the electronic apparatus 100 according to the disclosure may estimate coordinates using a pre-trained learning model for fingers hidden by other objects or subjects. However, the above example is an example for convenience of description, and data related to all movements and postures of the hand are not required to estimate the coordinates of the hand according to an embodiment of the disclosure.
FIG. 5 is a view illustrating an AR image displayed by an electronic apparatus according to an embodiment of the disclosure.
Referring to FIG. 5, a first image 51 including a user body 50, a second image 52 including a virtual object 5, and a third image 53 including a user body 50 and a virtual object 5 are illustrated.
The first image 51 may be an image captured by a camera included in the electronic apparatus 100. Specifically, the rear of the electronic apparatus 100 may by captured by the camera included in the electronic apparatus 100, and the captured image may include the user body 50. The user body 50 captured by the camera is illustrated in the first image 51.
In addition, the second image 52 may be an image generated by the virtual object 5 rendered by the electronic apparatus 100 based on the captured image so as to correspond to the first image captured by the camera included in the electronic apparatus 100. Since the virtual object 5 rendered by the electronic apparatus 100 is generated based on the captured image, depth information on space may be included.
The third image 53 may be an image generated by the first image 51 and the second image 52 being matched and calibrated by the electronic apparatus 100.
Specifically, the electronic apparatus 100 may extract RGB data including the user body from the first image captured by the camera included in the electronic apparatus 100. Then, the electronic apparatus 100 may estimate motions, shapes, and predicted coordinates of the user body by inputting the extracted RGB data. Alternatively, the electronic apparatus 100 may include depth data in the captured image, and estimate a plurality of joint coordinates for the user body by using the depth data.
For example, the electronic apparatus 100 may detect a hand of the user body 50 included in the first image 51, and estimate coordinates of the user body 50 detected by using the pre-trained learning model. Alternatively, the electronic apparatus 100 may estimate the coordinates of the user body 50 by additionally using the depth data of the user body 50 from the captured image.
The electronic apparatus 100 may match or calibrate the first image 51 and the second image 52 by using the estimated joint coordinates of the user body 50 and the depth information of the virtual object 5. Since the electronic apparatus 100 may generate an augmented reality image using the estimated coordinates of the user body and the coordinates of the virtual object, the user may be able to interact with the virtual object displayed through the display of the electronic apparatus 100.
FIG. 6 is a view illustrating a process of rendering a virtual hand object on a user body according to an embodiment of the disclosure.
Referring to FIG. 6, an augmented reality image 61 including a virtual hand object 60 is illustrated. As described above, the electronic apparatus 100 may estimate a plurality of joint coordinates for the user body, and render a virtual hand object based on the estimated plurality of joint coordinates.
Specifically, the electronic apparatus 100 may detect the user's hand from the captured image and estimate the plurality of joint coordinates for the detected user's hand. The electronic apparatus 100 may render a virtual hand object at coordinates corresponding to the estimated plurality of joint coordinates, and the electronic apparatus 100 may output a virtual hand object instead of the user body. The electronic apparatus 100 may track movements of the user's hand in real time, and the electronic apparatus 100 superimposes a virtual hand object on the user's hand, so that the user may move the virtual hand object 60 in the augmented reality image 61 just like moving the user’s hand, and interact with the virtual object.
When the electronic apparatus 100 detects an object touch, at least one of a user body or a virtual object may be transparently displayed. An embodiment in which the electronic apparatus 100 transparently displays a virtual object or a user body will be described with reference to FIGS. 7 and 9.
FIG. 7 is a view illustrating an event corresponding to an object touch according to an embodiment of the disclosure.
Referring to FIG. 7, an augmented reality image 71 including a virtual object 7 and a user body 70 is illustrated. The electronic apparatus 100 may identify whether the user body 70 touches the virtual object 7 based on the estimated joint coordinates, and perform an event corresponding to the object touch when the object touch is detected. The event may refer to changing a transmittance of the virtual object 7 or a color of the displayed virtual object 7. Meanwhile, the electronic apparatus 100 may change an alpha value of the virtual object 7 included in the generated augmented reality image 71 by a unit of pixel. When a touch of the user body 70 and the virtual object 7 is identified, the electronic apparatus 100 may change the transmittance of the virtual object 7. Alternatively, the electronic apparatus 100 may change only the transmittance of an area of the virtual object 7 corresponding to the touch. In other words, the electronic apparatus 100 may change the transmittance of the virtual object by changing the alpha values of all pixels corresponding to the virtual object 7 or by changing only the alpha values of pixels in one region of the virtual object 7.
FIG. 8 is a view illustrating an object touch according to an embodiment of the disclosure.
Referring to FIG. 8, an augmented reality image 81 including a virtual object 8 and user bodies 80a and 80b is illustrated. The electronic apparatus 100 may identify whether the user body 80a and 80b touch the virtual object 8 based on the estimated joint coordinates, and perform an event corresponding to the object touch when the object touch is detected. The event may refer to changing a transmittance of the rendered virtual hand object 80b.
Meanwhile, the electronic apparatus 100 may change an alpha value of the virtual hand object 80b included in the generated augmented reality image by a unit of pixel. The electronic apparatus 100 may change a transmittance of the virtual hand object 80b when a touch of the user bodies 80a and 80b and the virtual object 8 is identified. Alternatively, the electronic apparatus 100 may change only the transmittance of one area of the virtual hand object 80b corresponding to the touch.
Specifically, referring to FIG. 8, the virtual object 8 may be rendered by the electronic apparatus 100 and occupy a certain area in space (e.g., an area of a certain size of a cuboid). In addition, the electronic apparatus 100 may grasp 3D coordinates of the virtual object 8 and also estimate the 3D coordinates of the user bodies 80a and 80b. In other words, the electronic apparatus 100 may identify whether to touch the object by comparing the 3D coordinates of the virtual object 8 and the user's bodies 80a and 80b. In addition, the electronic apparatus 100 may identify whether a part of the user's bodies 80a and 80b touches the virtual object 8. The electronic apparatus 100 may change a transmittance of a part area (the virtual hand object 80b) of the user body determined to have generated object touch with the virtual object 8, or reverse the color of the rendered virtual hand object 80b.
FIG. 9 is a flowchart illustrating a method of controlling an electronic apparatus according to an embodiment of the disclosure.
Referring to FIG. 9, the electronic apparatus 100 may include a camera, a display, and a processor. The electronic apparatus 100 may capture a rear of the electronic apparatus 100 facing a front of which the display displays an image (S910). The front and the rear are described for convenience of description, and may be first and second surfaces.
The electronic apparatus 100 may render a virtual object based on the captured image (S920). Specifically, when an image captured by the camera is referred to as a first image or a first layer, the electronic apparatus 100 may generate a second image or a second layer including a virtual object based on the first image or the first layer.
When the user body is detected from the captured image, the electronic apparatus 100 may estimate a plurality of joint coordinates for the detected user body using a pre-trained learning model (S930). Specifically, when the user body detected from the captured image is a hand, the electronic apparatus 100 may estimate a plurality of joint coordinates corresponding to the finger joint and the palm by using the pre-trained learning model. The pre-trained learning model may be learned through a plurality of learning data including hand images.
The electronic apparatus 100 may generate an augmented reality image using the estimated plurality of joint coordinates, the rendered virtual object, and the captured image (S940). The electronic apparatus 100 may display the generated augmented reality image (S950). The electronic apparatus 100 may identify whether the user body touches the virtual object in the displayed augmented reality image, and perform an event corresponding to the object touch when the object touch is detected. In particular, the electronic apparatus 100 may identify whether the user body touches the virtual object based on the estimated joint coordinates, and change a transmittance of the virtual object when the touch is confirmed (S960). The electronic apparatus 100 may change an alpha value of the virtual object included in the generated augmented reality image by a unit of pixel, and the electronic apparatus 100 may identify an object touch based on whether the estimated joint coordinates are located at coordinates corresponding to the rendered virtual object. Also, the electronic apparatus 100 may change only the transmittance of an area of the virtual object corresponding to the touch. In other words, the electronic apparatus 100 may change the transmittance of the virtual object by changing the alpha value of all pixels corresponding to the virtual object or by changing only the pixel alpha value of the area of the virtual object.
The term “module” as used herein includes units made up of hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic blocks, components, or circuits. A “module” may be an integrally constructed component or a minimum unit or part thereof that performs one or more functions. For example, the module may be composed of an application-specific integrated circuit (ASIC).
The various example embodiments described above may be implemented as an S/W program including an instruction stored on machine-readable (e.g., computer-readable) storage media. The machine is an apparatus which is capable of calling a stored instruction from the storage medium and operating according to the called instruction, and may include an electronic apparatus (e.g., an electronic apparatus A) according to the above-described example embodiments. When the instruction is executed by a processor, the processor may perform a function corresponding to the instruction directly or using other components under the control of the processor. The command may include a code generated or executed by a compiler or an interpreter. A machine-readable storage medium may be provided in the form of a non-transitory storage medium. Herein, the term “non-transitory” only denotes that a storage medium does not include a signal but is tangible, and does not distinguish the case where a data is semi-permanently stored in a storage medium from the case where a data is temporarily stored in a storage medium.
The respective components (e.g., module or program) according to the various example embodiments may include a single entity or a plurality of entities, and some of the corresponding sub-components described above may be omitted, or another sub-component may be further added to the various example embodiments. Alternatively or additionally, some components (e.g., module or program) may be combined to form a single entity which performs the same or similar functions as the corresponding elements before being combined. Operations performed by a module, a program module, or other component, according to various embodiments, may be sequential, parallel, or both, executed iteratively or heuristically, or at least some operations may be performed in a different order, omitted, or other operations may be added.

Claims (15)

  1. An electronic apparatus comprising:
    a display;
    a camera configured to capture a rear of the electronic apparatus facing a front of the electronic apparatus in which the display displays an image; and
    a processor configured to:
    render a virtual object based on the image captured by the camera,
    based on a user body being detected from the captured image, estimate a plurality of joint coordinates with respect to the detected user body using a pre-trained learning model,
    generate an augmented reality image using the estimated plurality of joint coordinates, the rendered virtual object, and the captured image, and
    control the display to display the generated augmented reality image,
    wherein the processor is further configured to:
    identify whether the user body touches the rendered virtual object based on the estimated plurality of joint coordinates, and
    change a transmittance of the rendered virtual object based on the touch being identified.
  2. The apparatus of claim 1, wherein the processor is further configured to estimate the plurality of joint coordinates corresponding to a finger joint and a palm using the pre-trained learning model based on the detected user body being identified to be a hand.
  3. The apparatus of claim 2, wherein the processor is further configured to render a virtual hand object and the rendered virtual object based on the estimated plurality of joint coordinates.
  4. The apparatus of claim 1, wherein the processor is further configured to change a transmittance of one area of the rendered virtual object corresponding to the touch.
  5. The apparatus of claim 1, wherein the processor is further configured to change a transmittance of the user body and transparently display the user body, based on the touch being identified.
  6. The apparatus of claim 1, wherein the processor is further configured to:
    receive depth data of the captured image from the camera, and
    generate the augmented reality image using the received depth data.
  7. The apparatus of claim 1, wherein the pre-trained learning model is configured to be trained through a plurality of learning data comprising hand images by using a convolutional neural network (CNN).
  8. The apparatus of claim 7, wherein the plurality of learning data comprises first data in which a 3D coordinate is matched to at least one area of a hand image, and second data in which the 3D coordinate is not matched to the hand image, and
    wherein the pre-trained learning model is configured be trained by updating a weight value of the CNN based on the first data and the second data.
  9. A method of controlling an electronic apparatus comprising:
    capturing a rear of the electronic apparatus facing a front of the electronic apparatus in which a display displays an image;
    rendering a virtual object based on a captured image;
    based on a user body being detected from the captured image, estimating a plurality of joint coordinates with respect to the detected user body using a pre-trained learning model;
    generating an augmented reality image using the estimated plurality of joint coordinates, the rendered virtual object, and the captured image;
    displaying the generated augmented reality image;
    identifying whether the user body touches the rendered virtual object based on the estimated plurality of joint coordinates; and
    changing a transmittance of the rendered virtual object based on the touch being identified.
  10. The method of claim 9, wherein the estimating comprises estimating a plurality of joint coordinates corresponding to a finger joint and a palm using the pre-trained learning model based on the detected user body being identified to be a hand.
  11. The method of claim 10, wherein the rendering comprises rendering a virtual hand object and the rendered virtual object based on the estimated plurality of joint coordinates.
  12. The method of claim 9, wherein the changing comprises changing a transmittance of one area of the rendered virtual object corresponding to the touch.
  13. The method of claim 9, further comprising:
    changing a transmittance of the user body and transparently displaying the user body based on the touch being identified.
  14. The method of claim 9, wherein the generating comprises receiving depth data of the captured image from a camera, and generating the augmented reality image using the received depth data.
  15. The method of claim 9, wherein the pre-trained learning model is configured to be trained through a plurality of learning data comprising hand images by using a convolutional neural network (CNN).
PCT/KR2020/014721 2019-12-03 2020-10-27 Electronic apparatus and method for controlling thereof WO2021112406A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP20897014.5A EP4004878A4 (en) 2019-12-03 2020-10-27 Electronic apparatus and method for controlling thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020190159394A KR102702585B1 (en) 2019-12-03 2019-12-03 Electronic apparatus and Method for controlling the display apparatus thereof
KR10-2019-0159394 2019-12-03

Publications (1)

Publication Number Publication Date
WO2021112406A1 true WO2021112406A1 (en) 2021-06-10

Family

ID=76091642

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/014721 WO2021112406A1 (en) 2019-12-03 2020-10-27 Electronic apparatus and method for controlling thereof

Country Status (4)

Country Link
US (1) US11514650B2 (en)
EP (1) EP4004878A4 (en)
KR (1) KR102702585B1 (en)
WO (1) WO2021112406A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109799073B (en) * 2019-02-13 2021-10-22 京东方科技集团股份有限公司 Optical distortion measuring device and method, image processing system, electronic equipment and display equipment
US11334212B2 (en) 2019-06-07 2022-05-17 Facebook Technologies, Llc Detecting input in artificial reality systems based on a pinch and pull gesture
US11307653B1 (en) * 2021-03-05 2022-04-19 MediVis, Inc. User input and interface design in augmented reality for use in surgical settings
US11656690B2 (en) 2021-03-05 2023-05-23 MediVis, Inc. User input and virtual touch pad in augmented reality for use in surgical settings
US11429247B1 (en) 2021-03-05 2022-08-30 MediVis, Inc. Interactions with slices of medical data in augmented reality
KR102539047B1 (en) * 2021-06-04 2023-06-02 주식회사 피앤씨솔루션 Method and apparatus for improving hand gesture and voice command recognition performance for input interface of ar glasses device
KR102548208B1 (en) * 2021-06-04 2023-06-28 주식회사 피앤씨솔루션 Lightweight hand joint prediction method and apparatus for real-time hand motion interface implementation of ar glasses device
KR102483387B1 (en) * 2021-09-29 2022-12-30 한국기술교육대학교 산학협력단 Augmented reality content provision method and finger rehabilitation training system for finger rehabilitation training
WO2023063570A1 (en) * 2021-10-14 2023-04-20 삼성전자 주식회사 Electronic device for obtaining image data relating to hand motion and method for operating same
KR20230170485A (en) * 2022-06-10 2023-12-19 삼성전자주식회사 An electronic device for obtaining image data regarding hand gesture and a method for operating the same
WO2024071718A1 (en) * 2022-09-28 2024-04-04 삼성전자 주식회사 Electronic device for supporting augmented reality function and operating method thereof
US20240153215A1 (en) * 2022-11-08 2024-05-09 International Business Machines Corporation Iterative virtual reality modeling amelioration for three-dimensional to-be-printed objects and virtual reality filament sections
US11991222B1 (en) 2023-05-02 2024-05-21 Meta Platforms Technologies, Llc Persistent call control user interface element in an artificial reality environment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120249590A1 (en) * 2011-03-29 2012-10-04 Giuliano Maciocci Selective hand occlusion over virtual projections onto physical surfaces using skeletal tracking
US20140306891A1 (en) 2013-04-12 2014-10-16 Stephen G. Latta Holographic object feedback
US20180024641A1 (en) 2016-07-20 2018-01-25 Usens, Inc. Method and system for 3d hand skeleton tracking
WO2018071225A1 (en) * 2016-10-14 2018-04-19 Microsoft Technology Licensing, Llc Modifying hand occlusion of holograms based on contextual information
KR20180097949A (en) * 2017-02-24 2018-09-03 오치민 The estimation and refinement of pose of joints in human picture using cascade stages of multiple convolutional neural networks
US10134166B2 (en) * 2015-03-24 2018-11-20 Augmedics Ltd. Combining video-based and optic-based augmented reality in a near eye display

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997020305A1 (en) * 1995-11-30 1997-06-05 Virtual Technologies, Inc. Tactile feedback man-machine interface device
US20100053151A1 (en) 2008-09-02 2010-03-04 Samsung Electronics Co., Ltd In-line mediation for manipulating three-dimensional content on a display device
US8941559B2 (en) * 2010-09-21 2015-01-27 Microsoft Corporation Opacity filter for display device
KR101669897B1 (en) * 2014-03-28 2016-10-31 주식회사 다림비젼 Method and system for generating virtual studio image by using 3-dimensional object modules
KR101687017B1 (en) 2014-06-25 2016-12-16 한국과학기술원 Hand localization system and the method using head worn RGB-D camera, user interaction system
KR102333568B1 (en) * 2015-01-28 2021-12-01 세창인스트루먼트(주) Interior design method by means of digital signage to which augmented reality is applied
JP6597235B2 (en) 2015-11-30 2019-10-30 富士通株式会社 Image processing apparatus, image processing method, and image processing program
US10665019B2 (en) 2016-03-24 2020-05-26 Qualcomm Incorporated Spatial relationships for integration of visual images of physical environment into virtual reality
JP2017182532A (en) * 2016-03-31 2017-10-05 ソニー株式会社 Information processing apparatus, display control method, and program
KR101724360B1 (en) * 2016-06-30 2017-04-07 재단법인 실감교류인체감응솔루션연구단 Mixed reality display apparatus
CN109952551A (en) * 2016-09-16 2019-06-28 触觉实验室股份有限公司 Touch sensitive keyboard
WO2018187171A1 (en) * 2017-04-04 2018-10-11 Usens, Inc. Methods and systems for hand tracking
CN110520822B (en) * 2017-04-27 2023-06-27 索尼互动娱乐股份有限公司 Control device, information processing system, control method, and program
KR101961221B1 (en) * 2017-09-18 2019-03-25 한국과학기술연구원 Method and system for controlling virtual model formed in virtual space
WO2019087564A1 (en) * 2017-11-01 2019-05-09 ソニー株式会社 Information processing device, information processing method, and program
WO2019123530A1 (en) * 2017-12-19 2019-06-27 株式会社ソニー・インタラクティブエンタテインメント Information processing device, information processing method, and program
JP2018110871A (en) 2018-02-07 2018-07-19 株式会社コロプラ Information processing method, program enabling computer to execute method and computer
JP6525179B1 (en) * 2018-04-11 2019-06-05 株式会社アジラ Behavior estimation device
US11023047B2 (en) * 2018-05-01 2021-06-01 Microsoft Technology Licensing, Llc Electrostatic slide clutch with bidirectional drive circuit
JP6518931B1 (en) 2018-10-17 2019-05-29 株式会社Rockin′Pool Virtual space display system
CN109993108B (en) 2019-03-29 2019-12-03 济南大学 Gesture error correction method, system and device under a kind of augmented reality environment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120249590A1 (en) * 2011-03-29 2012-10-04 Giuliano Maciocci Selective hand occlusion over virtual projections onto physical surfaces using skeletal tracking
US20140306891A1 (en) 2013-04-12 2014-10-16 Stephen G. Latta Holographic object feedback
US10134166B2 (en) * 2015-03-24 2018-11-20 Augmedics Ltd. Combining video-based and optic-based augmented reality in a near eye display
US20180024641A1 (en) 2016-07-20 2018-01-25 Usens, Inc. Method and system for 3d hand skeleton tracking
WO2018071225A1 (en) * 2016-10-14 2018-04-19 Microsoft Technology Licensing, Llc Modifying hand occlusion of holograms based on contextual information
KR20180097949A (en) * 2017-02-24 2018-09-03 오치민 The estimation and refinement of pose of joints in human picture using cascade stages of multiple convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4004878A4

Also Published As

Publication number Publication date
KR102702585B1 (en) 2024-09-04
KR20210069491A (en) 2021-06-11
EP4004878A1 (en) 2022-06-01
US20210166486A1 (en) 2021-06-03
EP4004878A4 (en) 2022-09-21
US11514650B2 (en) 2022-11-29

Similar Documents

Publication Publication Date Title
WO2021112406A1 (en) Electronic apparatus and method for controlling thereof
WO2018128355A1 (en) Robot and electronic device for performing hand-eye calibration
WO2018012945A1 (en) Method and device for obtaining image, and recording medium thereof
WO2019088667A1 (en) Electronic device for recognizing fingerprint using display
WO2020105863A1 (en) Electronic device including camera module in display and method for compensating for image around camera module
WO2015122565A1 (en) Display system for displaying augmented reality image and control method for the same
EP3134764A1 (en) Head mounted display and method for controlling the same
EP3776469A1 (en) System and method for 3d association of detected objects
WO2019194606A1 (en) Electronic device including bendable display
WO2015030307A1 (en) Head mounted display device and method for controlling the same
WO2019066273A1 (en) Electronic blackboard and control method thereof
WO2016089114A1 (en) Method and apparatus for image blurring
WO2016072785A1 (en) Direction based electronic device for displaying object and method thereof
WO2019160325A1 (en) Electronic device and control method thereof
WO2019103427A1 (en) Method, device, and recording medium for processing image
WO2014073939A1 (en) Method and apparatus for capturing and displaying an image
WO2021158055A1 (en) Electronic device including touch sensor ic and operation method thereof
WO2019004754A1 (en) Augmented reality advertisements on objects
CN114116435A (en) Game testing method, system, electronic device and computer readable storage medium
WO2015009112A9 (en) Method and apparatus for displaying images in portable terminal
WO2022098164A1 (en) Electronic device and control method of electronic device
WO2013022154A1 (en) Apparatus and method for detecting lane
WO2022097805A1 (en) Method, device, and system for detecting abnormal event
WO2019231119A1 (en) Apparatus and method for congestion visualization
WO2019245102A1 (en) Method and electronic device for displaying content

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20897014

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020897014

Country of ref document: EP

Effective date: 20220222

NENP Non-entry into the national phase

Ref country code: DE