KR102044003B1

KR102044003B1 - Electronic apparatus for a video conference and operation method therefor

Info

Publication number: KR102044003B1
Application number: KR1020170155550A
Authority: KR
Inventors: 지덕구; 강진아; 구기종; 문종배; 이종국; 장종현; 조정현; 최승한; 한미경
Original assignee: 한국전자통신연구원
Priority date: 2016-11-23
Filing date: 2017-11-21
Publication date: 2019-11-12
Also published as: KR20180058199A

Abstract

A first electronic device and a method of operating the same are disclosed. The first electronic device may include a first camera that generates a first image of the first user; A receiver configured to receive, from the second camera, a second image photographing a specific area of the first user not photographed by the first camera through a second camera worn by the first user; Detecting a face image of the first user from the first image, generating a third image which is a 3D stereoscopic image corresponding to the face of the first user based on the first image, and generating the third image and the A processor configured to generate a synthesized image obtained by synthesizing the second image; And a transmitter for transmitting the synthesized image to another electronic device. Therefore, the first electronic device may provide a vivid expression of the first user to the second electronic device.

Description

ELECTRICAL APPARATUS FOR A VIDEO CONFERENCE AND OPERATION METHOD THEREFOR}

The present invention relates to an image processing technology, and more particularly, to an electronic device for a video conference and a method of operating the same.

A video conference system can acquire video and audio signals for each user through each camera and microphone located remotely. The video conferencing system may transmit the acquired video and audio signals to each other user via a network. In addition, the video conferencing system may transmit each received video and audio signal to each user through each display and speaker.

The user may wear conference equipment, such as a headphone mic set or a head mounted display (HMD). The video conferencing system may provide an image of a user wearing the conference equipment to another user. At this time, the other user may have a problem that the immersion feeling for the video conference is reduced due to the conference equipment worn by the user in the video.

An object of the present invention for solving the above problems is to provide an electronic device and method for video conferencing for transmitting and displaying by replacing the image of the user wearing the conference equipment with the image of the user not wearing the conference equipment have.

According to an aspect of the present invention, there is provided a first electronic device including: a first camera generating a first image of a first user; A receiver configured to receive, from the second camera, a second image photographing a specific area of the first user not photographed by the first camera through a second camera worn by the first user; Detecting a face image of the first user from the first image, generating a third image which is a 3D stereoscopic image corresponding to the face of the first user based on the first image, and generating the third image and the A processor configured to generate a synthesized image obtained by synthesizing the second image; And a transmitter for transmitting the synthesized image to another electronic device.

The processor generates position information and size information of the face part image of the first user from the first image, and generates a second camera image which is an image of a portion where the second camera is worn on the face part image of the first user. And detect location information and size information of the second camera image.

The processor generates a face image model of the first user based on position information and size information of the face image of the first user, and generates a frame of the first image based on the face image image of the first user. The apparatus may determine whether a face image of the first user exists and generate face information of the first user based on the face image model of the first user.

The processor generates the second camera image model based on position information and size information of the second camera image, and the second camera image in a frame of the first image based on the second camera image model. It can be determined whether this exists.

The processor generates 3D image information on the face part of the first user based on the first image, and the first user based on direction information and the 3D image information of the face part image of the first user. The third image corresponding to the direction of the face portion may be generated.

The processor is configured to generate first image illuminance information on an average illuminance of the first image, generate second image illuminance information on an average illuminance of the second image, and generate a first image related to the average illuminance of the third image. 3 Image illuminance information can be generated.

The processor may change the second image illuminance information and the third image illuminance information to correspond to the first image illuminance information based on the first image illuminance information.

The processor may correct the distortion of the second image having the illumination information changed, and change the direction of the second image to correspond to the direction of the face part of the first user based on the direction information of the face part image of the first user. have.

The processor generates a modified image in which the face image of the first user is changed to the third image, and includes position information, size information and direction information of the face image of the first user, and position information of the second camera image. And generate the synthesized image obtained by synthesizing the changed image and the second image based on the direction information.

The processor may monitor a face image of the first user in the first image and update the composite image based on the monitoring result.

According to an embodiment of the present disclosure, a method of operating a first electronic device may include generating a first image of a first user by using a first camera; Receiving, from the second camera, a second image photographing a specific area of the first user not photographed by the first camera through a second camera worn by the first user; Detecting a face image of the first user from the first image; Generating a third image, which is a 3D stereoscopic image, corresponding to the face part of the first user based on the first image; Generating a synthesized image obtained by synthesizing the third image and the second image; And transmitting the composite image to another electronic device.

The detecting of the face unit image of the first user may include generating location information and size information of the face unit image of the first user from the first image; Detecting a second camera image, which is an image of a part where the second camera is worn, from a face image of the first user; And generating location information and size information of the second camera image.

The detecting of the face image of the first user may include generating a face image model of the first user based on location information and size information of the face image of the first user; Determining whether a face image of the first user exists in a frame of the first image based on the face image model of the first user; And generating facial part direction information of the first user based on the facial part image model of the first user.

The detecting of the face image of the first user may include generating the second camera image model based on location information and size information of the second camera image; And determining whether the second camera image exists in a frame of the first image based on the second camera image model.

The generating of the third image may include generating three-dimensional image information of the face part of the first user based on the first image; And generating the third image corresponding to the direction of the face part of the first user based on the direction information of the face part image of the first user and the 3D image information.

Generating a third image may include generating first image illumination information regarding an average illumination of the first image; Generating second image illuminance information regarding an average illuminance of the second image; And generating third image illumination information regarding an average illumination of the third image.

The generating of the third image may further include changing the second image illuminance information and the third image illuminance information to correspond to the first image illuminance information based on the first image illuminance information. can do.

The generating of the third image may include correcting a distortion of the second image having the illumination information changed; And changing the direction of the second image to correspond to the direction of the face part of the first user based on the direction information of the face part image of the first user.

The generating of the synthesized image may include: generating a changed image of changing a face image of the first user into the third image; And generating the composite image by combining the changed image and the second image based on the position information, the size information and the orientation information of the face image of the first user, and the position information and the orientation information of the second camera image. It can include;

The generating of the synthesized image may include: monitoring a face image of the first user in the first image; And updating the composite image based on the monitoring result.

According to the present invention, by providing a video conferencing electronic device and method for transmitting and outputting the image of the user wearing the conference equipment to the image of the user not wearing the conference equipment, participating in the video conference Can be provided to other users.

1A is a conceptual diagram of a video conferencing system providing a user image, according to an exemplary embodiment.
1B is a conceptual diagram of a video conference system providing a synthesized user image, according to an exemplary embodiment.
2 is a block diagram illustrating a configuration of a video conference system according to an exemplary embodiment.
3 is a block diagram illustrating a configuration of an electronic device according to an embodiment.
4 is a block diagram illustrating a configuration of a camera in an embodiment.
5 is a block diagram illustrating a configuration of a head mounted display according to an embodiment.
6 is a block diagram illustrating a configuration of an image processing apparatus according to an exemplary embodiment.
7 is a flowchart illustrating an operation sequence of an electronic device, according to an exemplary embodiment.
8A is a flowchart illustrating an operation sequence of an electronic device for detecting an object and estimating a direction of a face of a user, according to an exemplary embodiment.
8B is a flowchart illustrating an operation sequence of an electronic device for generating a composite image, according to an exemplary embodiment.

As the present invention allows for various changes and numerous embodiments, particular embodiments will be illustrated in the drawings and described in detail in the written description. However, this is not intended to limit the present invention to specific embodiments, it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention.

Terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component. The term and / or includes a combination of a plurality of related items or any item of a plurality of related items.

When a component is referred to as being "connected" or "connected" to another component, it may be directly connected to or connected to that other component, but it may be understood that other components may be present in between. Should be. On the other hand, when a component is said to be "directly connected" or "directly connected" to another component, it should be understood that there is no other component in between.

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, the terms "comprise" or "have" are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, and one or more other features. It is to be understood that the present invention does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or a combination thereof.

Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art and shall not be construed in ideal or excessively formal meanings unless expressly defined in this application. Do not.

Hereinafter, with reference to the accompanying drawings, it will be described in detail a preferred embodiment of the present invention. In the following description of the present invention, the same reference numerals are used for the same elements in the drawings and redundant descriptions of the same elements will be omitted.

1A is a conceptual diagram of a video conferencing system providing a user image, according to an exemplary embodiment.

Referring to FIG. 1A, a video conferencing system may include a first user system 110 and a second user system 120. The first user system 110 and the second user system 120 may transmit and receive real-time video and audio data of each user through a network or point-to-point communication through a server (not shown).

The first user system 110 may photograph the first user 112 wearing the conference equipment through a camera. The first user system 110 may display an image of the first user 112 wearing the conference equipment 113 through the first display 111. The first user system 110 may transmit an image of the first user 112 wearing the conference equipment to the second user system 120. The second user system 120 may display an image of the first user 112 wearing the conference equipment 123 through the second display 121.

For example, the first user system 110 may acquire a real time image and audio of the first user 112 through a camera. The first user system 110 may display an image of the first user 112 through the first display 111.

The first user system 110 may transmit a real time image and audio of the first user 112 to the second user system 120. For example, the first user system 110 may generate an image signal photographing the first user 112. The first user system 110 may mix the image signal into one image based on predetermined layout information. The first user system 110 can transmit the mixed signal to the second user system 120. The second user system 120 may display an image of the first user 112 through the second display 121 of the second user system 120 based on the image signal received from the first user system 110. have.

In this case, each of the first user system 110 and the second user system 120 may be disposed in a separate video conference room (not shown) for improving immersion in video conferences and providing vivid video and audio. In addition, the first user system 110 and the second user system 120 may utilize augmented reality (AR) and virtual reality (VR) technologies to reduce equipment costs for the video conference room. Can be. For example, the first user system 110 and the second user system 120 replace each user image with a virtual avatar in a video conference room represented by a two-dimensional or three-dimensional virtual space, respectively. The user's motions and emotions can be expressed through the avatar.

The first user 112 may wear a video conferencing device. For example, the first user may wear the first head mounted display 113. The first user system 110 may display an image of the first user 112 wearing the first head mounted display 113 on the display.

The first head mounted display 113 is an electronic device wearable on the face of the first user 112 and may include a small display (not shown) disposed adjacent to the eye of the first user 112. In addition, the first head mounted display 113 may have an effect such as using a large screen through the small display. For example, the first head mounted display 113 may output a video game through the small display. The first head mounted display 113 may maximize the immersion degree of the first user 112 by creating a 3D virtual space through the small display.

In addition, the first head mounted display 113 may include a plurality of sensors (not shown). The first head mounted display 113 may provide various interaction functions to the first user 112 through a plurality of sensors. For example, the first head mounted display 113 may provide a video conference to the first user 112 through a plurality of sensors.

The first head mounted display 113 may be worn on the face of the first user 112 to cover a portion of the face of the first user 112. The first user system 110 may transmit an image of the first user 112 wearing the first head mounted display 113 to the second user system 120.

The second user system 120 based on the image signal received from the first user system 110, the first user image 112 wearing the first head mounted display 113 through the second display 121. You can output In this case, the second user may experience a problem that the immersion in the video conference is reduced and the discomfort is felt due to the image of the first user 112 wearing the first head mounted display 113. In addition, the second user may not confirm the exact expression of the first user 112 through the first user image 112 wearing the first head mounted display 113, thereby limiting the expression of expression and information transmission. Feeling problems may occur.

1B is a conceptual diagram of a video conference system providing a synthesized user image, according to an exemplary embodiment.

Referring to FIG. 1B, the first user system 110 displays a first head mounted display of an image of the first user 112 wearing the first head mounted display 113 in a general video conference system or a virtual video conference system. The image 113 of the first user 112 not wearing the 113 may be replaced and transmitted to the second user system 120. Accordingly, the first user system 110 may provide a vivid expression of the first user 112 to the second user.

In a general video conferencing system or a virtual video conferencing system, the first user system 110 may include a separate user before joining the video conference when the first user 112 wears the first head mounted display 113 and participates in the video conference. The image processing apparatus (not shown) may perform 3D scanning of the face part or the specific body or the entire part of the first user 112. In this case, when the first user 112 participates in the video conference, the first user system 110 may display the first head mounted display 113 on the image of the first user 112 wearing the first head mounted display 113. The first head mounted display area 114 obscured by may be replaced with a pre-scanned three-dimensional image.

That is, the first user system 110 may synthesize a part of the first user 112 with the 3D image. The first user system 110 may transmit the synthesized image to the second user system 120. The second user system 120 may output an image of the first user 112 in which the first head mounted display area 114 is replaced with a three-dimensional image through the second display 121. Accordingly, the second user system 120 may accurately convey the expression, behavior, gesture, etc. of the first user 112 to the second user.

2 is a block diagram illustrating a configuration of a video conference system according to an exemplary embodiment.

The video conferencing system according to an embodiment may be a virtual video conferencing system that provides a video conference to a user using a head mounted display in a virtual video conference space using augmented reality and virtual reality.

Referring to FIG. 2, the video conferencing system may include a first user system 210, a second user system 220, and a server 230. The first user system 210 can operate identically or similarly to the first user system 110 of FIG. 1. The second user system 220 may operate identically or similarly to the second user system 120 of FIG. 1.

The first user system 210 may include a first electronic device 211, a first camera 212, and a first head mounted display 213. The second user system 220 may include a second electronic device 221, a second camera 222, and a second head mounted display 223. The first user system 210 may further include a separate first image processing device (not shown). In addition, the second user system 220 may further include a separate second image processing apparatus (not shown).

Each of the first and second head mounted displays 213 and 223 may include a display (not shown), a camera for photographing a user's face, and various sensors (not shown). The first camera 212 may photograph a body part or the entire image including the face of the first user and transmit the captured image to the first electronic device 211. In addition, the second camera 222 may photograph the body part or the entire image including the face of the second user and transmit the captured image to the second electronic device 221.

The first user face photographing camera included in the first head mounted display 213 photographs a partial image of a face part including an eye part of the first user who wears the first head mounted display 213 to display the first electronic device ( 211). In addition, the second user face photographing camera included in the second head mounted display 223 captures an image of a part of the face part including an eye part of the second user wearing the second head mounted display 223 to display the second electronic device. Transmit to device 221.

3 is a block diagram illustrating a configuration of an electronic device according to an embodiment.

Referring to FIG. 3, the electronic device 300 includes a communication unit 301, a sensor unit 302, a camera 303, an image processing unit 304, a display 305, an input unit 306, a memory 307, and audio. It may include a processor 308 and a processor 309. The electronic device 300 may operate in the same or similar manner as the first electronic device 211 and the second electronic device 221 of FIG. 2.

The communication unit 301 may communicate with external devices such as home appliances, a user's portable device, and a wearable device. In this case, the communication unit 301 may communicate with an external device in various ways. The communication unit 301 may perform at least one of wireless communication or wired communication. The communication unit 301 may include Long Term Evolution (LTE), Wide Code Code Division Multiple Access (WCDMA), Global System for Mobile Communications (GSM), Wireless Fidelity (WiFi), Bluetooth (Bluetooth), near field communications (NFC), and BLE (BLE). Communication may be performed based on Bluetooth Low Energy (IR) and infrared ray (infrared ray).

The sensor unit 302 may detect a user's approach or detect an operation state of the electronic device 300 to convert the detected information into an electrical signal. The sensor unit 302 may include, for example, a sensor such as a user detection sensor, a gesture sensor, an illuminance sensor, and the like. The sensor unit 302 may include a control circuit for controlling at least one sensor. The sensor unit 302 may provide sensing information sensed by at least one sensor to the processor 309.

The camera 303 may be disposed at a specific position of the electronic device 300 to acquire image data of the subject. To this end, the camera 303 may receive an optical signal. The camera 303 may generate image data from an optical signal. The camera 303 may include a camera sensor and a signal converter. The camera sensor may be included in the sensor unit 302. The camera sensor may convert an optical signal into an electrical image signal. The signal converter may convert the analog video signal into digital video data. The camera 303 may operate identically or similarly to the first camera 212 or the second camera 222 of FIG. 2.

The image processor 304 may process image data. The image processor 304 may process the image data in frame units and output the image data in correspondence with the characteristics and the size of the display 305. Here, the image processor 304 may compress the image data in a set manner or restore the compressed image data to the original image data. The image processor 304 may provide the processor 309 with image data processed in units of frames.

The display 305 may display display data according to the operation of the electronic device 300. Such displays 305 include liquid crystal displays (LCDs), light emitting diode (LED) displays, organic light emitting diode (OLED) displays, and micro electro mechanical systems (MEMS). Display and electronic paper display. The display 305 may be combined with the input unit 306 to be implemented as a touch screen.

The input unit 306 may generate input data in response to a user input of the electronic device 300. The input unit 306 may include at least one input means. The input unit 306 includes a key pad, a dome switch, a touch panel, a jog & shuttle, a sensor, a touch key, and a menu button ( menu button).

The memory 307 may store operation programs of the electronic device 300. The memory 307 may include, for example, an internal memory or an external memory. The internal memory may be, for example, volatile memory (for example, dynamic RAM (DRAM), static RAM (SRAM), or synchronous dynamic RAM (SDRAM), etc.), nonvolatile memory (for example, one time programmable). ROM (ROM), programmable ROM (PROM), erasable and programmable ROM (EPROM), electrically erasable and programmable ROM (EPEROM), mask ROM, flash ROM, flash memory (such as NAND flash or NOR flash), hard drive, or solid It may include at least one of a solid state drive (SSD) External memory may be a flash drive, for example, compact flash (CF), secure digital (SD), micro-SD (micro-SD). The device may further include a secure digital (mini-SD), an extreme digital (XD), an extreme digital (XD), a multi-media card (MMC), a memory stick, etc. The external memory may include an electronic device through various interfaces. And may be functionally and / or physically connected.

The audio processor 308 may process an audio signal. The audio processor 308 may include a speaker (not shown) and a microphone (not shown). The audio processor 308 may reproduce the audio signal output from the processor 309 through the speaker. The audio processor 308 may transmit the audio signal generated by the microphone to the processor 309.

The processor 309 may execute a program command stored in the memory 307. The processor 309 may mean a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor on which various operations are performed.

In addition, although not shown, when the electronic device 300 is a TV, the processor 309 may receive a broadcast signal from a set top box (not shown) and output the broadcast signal to the display 305. When the electronic device 300 includes a remote controller, the processor 309 may receive a user input signal from the remote controller. In addition, the electronic device 300 may be formed in the form of a server and provided in the home. In this case, the electronic device 300 may not include the display 305.

4 is a block diagram illustrating a configuration of a camera in an embodiment.

Referring to FIG. 4, camera 400 may be any suitable device arranged to collect, generate, process, and transmit images. That is, the camera 400 may be any device arranged to perform all or part of the operations performed for image processing. The camera 400 may operate the same as or similar to the first camera 212 or the second camera 222 of FIG. 2, or may be the same device. In addition, the camera 400 may operate the same as or similar to the camera 303 of FIG. 3, or may be the same device.

The camera 400 may include a processor 401, a memory 402, a controller 404, a sensor unit 405, and a network interface 403.

The network interface 403 may include any suitable hardware or software that enables the camera 400 to properly communicate with the head mounted display, server, or other device over a network. The network interface 403 may be arranged to obtain three-dimensional models, background images, object tracking data, camera / light sensor tracking data or other types of data from external sources. The network interface 403 may also be arranged to send data to the head mounted display over the network, and the transmitted image data may be rendered and viewed by the user of the head mounted display. Network interface 403 may be arranged to transmit and receive data using any suitable network or communication protocol.

The memory 402 may be any hardware or software suitable for storing data or executable computer code. Memory 402 may include, but is not limited to, a hard drive, flash drive, nonvolatile memory, volatile memory, or any other type of computer readable storage medium. All operations or methods for the camera 400 described above may be stored in memory 402 in the form of executable computer code or instructions. Execution of computer code or instructions by processor 401 may cause camera 400 to perform all the operations or methods described above.

The controller 404 may be any hardware or software arranged to perform all the operations or methods described in the application applied to the camera 400. The controller 404 allows the camera 400 to obtain tracking data of a three-dimensional model, a background image, a head mounted display or an image processing device, render a model, obtain a three-dimensional image, and synthesize a three-dimensional image and a user image. And, it can be arranged to transmit the resulting composite image.

The sensor unit 405 may include any hardware or software arranged to track or monitor facial features of a user wearing a head mounted display. For example, sensor unit 405 may include at least one sensor disposed to track light emitted by any device of the head mounted display. The sensor unit 405 may be used to track specific features of the user's face in the general direction or position of the user's head. The sensor unit 405 may include at least one or more optical sensors arranged to track the light level of the nearby object or the user or the light level of the surrounding. The camera 400 may analyze and process image and sensor data.

5 is a block diagram illustrating a configuration of a head mounted display according to an embodiment.

Referring to FIG. 5, the head mounted display 500 includes a processor 501, a memory 502, a display 504, a network interface 503, a light emitting unit 505, a sensor unit 506, and a controller 507. And a power supply 508. The head mounted display 500 may operate the same as or similar to the first head mounted display 213 or the second head mounted display 223 of FIG. 2.

The processor 501 may not be used or needed (as a main component) for image processing or other operations described below. Instead, any data / images to be processed are sent to a remote device, and processing can be performed by that remote device (eg, server, game console, camera device, television, any suitable computing device, etc.). have.

The light emitter 505 may include at least one light emitting device. Any suitable kind of light emitting device or light emitting technology (eg, infrared, LED, etc.) may be used, and each light emitting device may be located on any surface or portion of the head mounted display 500. For example, the light emitting device may be located outside the head mounted display 500. The light emitting device may emit light to be used as a marker by the camera when the head mounted display 500 is worn by a user in front of the camera. That is, the camera can use the emitted light to help track the movement, direction and / or location of the user's face.

The interior of the head mounted display 500 may also include a light emitting device. For example, the head mounted display 500 may include at least one infrared light emitting device located at the rear or inside of the head mounted display 500. At least one optical power source (eg, infrared power source) may be located on the head mounted display 500, and may illuminate the user's eyes when the user wears the head mounted display 500. Infrared power can help track certain facial features (eg, gaze, eyelids, etc.).

The sensor unit 506 may include at least one or more sensors disposed to track a face movement of a user wearing the head mounted display 500. Any suitable sensor technology may be used, including, but not limited to, tracking cameras, pressure sensors, temperature sensors, mechanical sensors, motion sensors, light sensors, and electronic sensors.

In addition, there may be at least one or more optical sensors fixed to the head mounted display 500. The optical sensor may be fixed to an outer surface of the head mounted display 500. The light sensor may be arranged to detect the ambient light level of the head mounted display 500. The head mounted display 500 may transmit optical sensor data to the image processing apparatus and server using the network interface 503, which data may be used to determine how tonal or light levels should be adjusted in the three-dimensional scanning apparatus. Can be.

The sensor unit 506 may also include at least one or more cameras or other tracking devices disposed to track the movement of facial features covered by the head mounted display 500. For example, the sensor unit 506 may include at least one camera positioned to track the gaze, eye rotation, eyelid movement, and / or other facial features under the head mounted display 500 when worn by a user. It may include.

Display 504 may also include any hardware or software used to display an image, video, or graphic to a user of head mounted display 500. The display 504 may be positioned to directly face the user's eyes when the user wears the head mounted display 500. Display 504 may use any suitable display and / or projection technique for showing an image to a user. Display 504 may be arranged to provide a virtual reality experience to a user. That is, the head mounted display 500 can completely hide the user's eyes and eliminate his ability to see the physical surroundings. Once display 504 is activated, the user can see only the graphics and images that display 504 produces. This can give the user a feeling of being in a completely different, virtual environment. When the user turns his or his head, the sensors on the head mounted display 500 detect the movement, the user is physically present in the simulated environment, and the user explores the environment as in any real, physical environment. You can change the images to make them feel like you can.

The display 504 may be arranged to display the face of another HMD device user along with the simulated facial expressions in real time. Any known virtual reality display technology can be used in the display 504.

The power source 508 can be any suitable hardware or software used to store energy or power. The stored energy may be used to power other components and operations of the head mounted display 500. Any suitable energy storage mechanism can be used. For example, power source 508 may be a battery. The head mounted display 500 may be wired from an external power source to receive power.

The network interface 503 may include any suitable hardware or software for the head mounted display 500 to be able to communicate with other devices (eg, cameras, servers, another head mounted display, etc.) over the network. Can be.

The network interface 503 may be arranged to send tracking and sensor data to the camera device for processing. The network interface 503 may be arranged to receive images and image data over a network from other cameras and head mounted displays. Network interface 503 may be arranged to transmit and receive data using any suitable network or communication protocol.

Memory 502 may include any hardware or software suitable for storing data or executable computer code. Memory 502 may include, but is not limited to, a hard drive, flash drive, nonvolatile memory, volatile memory, or any other type of computer readable storage medium.

Any operation or method for the head mounted display 500 described above may be stored in the memory 502 in the form of executable computer code or instructions. Execution of computer code or instructions by processor 501 may cause head mounted display 500 to perform all of the operations or methods described above.

The controller 507 may be any hardware or software arranged to perform all the operations or methods described in the above-described applications applied to the head mounted display 500. Controller 507 may be arranged such that head mounted display 500 tracks facial features and renders and displays images received from another head mounted display or camera.

The head mounted display 500 can have a wide variety of different form factors, sizes, dimensions and configurations.

6 is a block diagram illustrating a configuration of an image processing apparatus according to an exemplary embodiment.

Referring to FIG. 6, the image processing apparatus 600 includes an object detection module 601, an object tracking module 602, an object direction determination module 603, an illumination intensity detection module 604, an illumination intensity matching module 605, and an image. The conversion module 606 may include a 3D scanning image processing module 607, a user image database 608, and an image synthesis module 609.

The image processing apparatus 600 may be included in the first user system 110 and the second user system 120 of FIG. 2, respectively. For example, the image processing device 600 may be a separate device included in or connected to each of the first electronic device 211 and the second electronic device 221 of FIG. 2. In addition, the image processing apparatus 600 may be a separate device included in or connected to each of the first camera 212 and the second camera 222 of FIG. 2.

The object detection module 601 may detect at least one object worn by the user in the first image photographed through the camera. The object detection module 601 may generate object detection information including information regarding whether at least one object is detected. In addition, the object detection module 601 may generate object size and position information including information regarding the size and position of the detected at least one object.

The object tracking module 602 may receive object size and location information from the object detection module 601. The object tracking module 602 may track the location of at least one object in the frame of the first image based on the object size and the location information.

The object direction determining module 603 may determine the front direction of the face part of the user in the first to third images. For example, the object direction determining module 603 may estimate the front direction of the face part of the user that changes according to the posture and the motion of the user. The object direction determining module 603 may generate user face direction information regarding a face direction of a user.

The illuminance detection module 604 may detect illuminance information of the image. For example, the illuminance detection module 604 may calculate the illuminance of the image that is changed according to the illumination state at the time of capturing the image or the brightness setting state of the camera. The object direction determining module 603 may generate illuminance information based on the calculated illuminance of the image.

The illuminance matching module 605 may compare the illuminance of two different images and correspond to the illuminance of one image. For example, the illuminance matching module 605 may change the illuminance of the first image to match the illuminance of the second image.

The image conversion module 606 may correct the distortion phenomenon. For example, the image conversion module 606 may correct a curvature distortion phenomenon that occurs to support a wide angle of view when using a wide angle lens.

The 3D scanning image processing module 607 may process a 3D image of an image photographed through a camera. For example, the 3D scanning image processing module 607 may process the 3D image of the face part of the photographed user. The 3D scanning image processing module 607 may generate a 3D image of the face part of the photographed user. The 3D scanning image processing module 607 may classify the 3D image of the user's face unit according to a predetermined direction and angle with respect to the x, y, and z axes. For example, the 3D scanning image processing module 607 may perform 3D user facial image information according to predetermined directions and angles with respect to the x, y, and z axes based on the 3D image of the face part of the user. Can be generated.

Also, the 3D scanning image processing module 607 may process a user image photographed through a camera. In this case, the 3D scanning image processing module 607 may generate user face image information classified by the user image captured by the camera according to a predetermined direction and angle with respect to the x-axis, y-axis, and z-axis. .

The user image database 608 may store 3D user face image information or user face image information generated by the 3D scanning image processing module 607. The image synthesizing module 609 may generate a composite image in which a specific region image of the first image is replaced with a specific region image of the second image.

7 is a flowchart illustrating an operation sequence of an electronic device, according to an exemplary embodiment.

Referring to FIG. 7, the electronic device may generate a captured image (S701).

The electronic device may operate in the same or similar manner as the first and second electronic devices 211 and 221 of FIG. 2 or the electronic device 300 of FIG. 3. The electronic device may photograph the background of the user and the surroundings of the user through a camera included in the electronic device or a separate external camera. The electronic device may generate a first image photographing a background of the user and the surroundings of the user.

In this case, the user may wear an object such as conference equipment or glasses or a hat. In addition, the user can wear a head mounted display. The electronic device can take a picture of the user wearing the head mounted display.

The electronic device may generate a second image of the face of the user. The user may wear a head mounted display. The head mounted display may comprise a camera. The head mounted display may photograph the face of the user through a camera. The head mounted display may transmit an image of the face of the user to the electronic device. The electronic device may generate a second image of the face part of the user captured by the camera of the head mounted display. In addition, the electronic device may generate a second image of the face of the user through a separate external camera.

The electronic device may photograph a user who has not worn an object. The electronic device may generate a third image by photographing a face of a user who has not worn an object such as glasses or a hat through a camera. The electronic device may three-dimensionally process the third image by the image processing device. The electronic device may generate a 3D image through the image processing device. The electronic device may generate an image or image information having a predetermined angular interval according to the x-axis, y-axis, and z-axis directions of the 3D image through the image processing apparatus. The electronic device may display the first to third images on the display of the electronic device. The electronic device may transmit the first to third images to another electronic device.

The electronic device may detect an object worn by the user and determine a direction of the face part of the user (S702).

The electronic device may detect an object worn by the user in the first image. For example, the electronic device may detect an object such as glasses or a hat or a head mounted display worn by the user of the first image. The electronic device may determine the position and size of the object. The electronic device may determine whether the object is the same as a predefined object. In addition, the electronic device may determine the direction of the face part of the user based on the third image.

The electronic device may generate a video call or a video for a meeting in operation S703.

The electronic device may adjust distortion and change illuminance according to characteristics of the lens with respect to the second image. The electronic device may determine an image or image that matches the direction of the face part of the current user based on the image or image data of a predetermined angular interval included in the third image. In addition, the electronic device may process the determined image or image.

The electronic device may change the object image worn by the user into another image. The electronic device may change the image of the portion where the user and the object overlap, into another image. For example, the electronic device may change the image of the portion where the user and the object worn by the user overlap with the image before the user wears the object, based on the second image and the third image. The electronic device may arrange the changed user image in the virtual video conference space based on the predefined layout information.

8A is a flowchart illustrating an operation sequence of an electronic device for detecting an object and estimating a direction of a face of a user, according to an exemplary embodiment.

The electronic device may operate in the same or similar manner as the first electronic device 211 or the second electronic device 222 of FIG. 2. Also, the electronic device may operate similarly or similarly to the electronic device 300 of FIG. 3. Also, the electronic device may include both the components of the electronic device 300 of FIG. 3 and the components of the image processing device 600 of FIG. 6.

Referring to FIG. 8A, the electronic device may receive a first image and a second image (S801).

The electronic device may receive the first image from the first camera. The first camera may be located in front of the user to photograph the user. The first camera may generate a first image of the user and the background of the user. The first camera may transmit the first image to the electronic device. The electronic device may include a first camera. Alternatively, the first camera may be a separate external device connectable with the electronic device.

The electronic device may receive the second image from the head mounted display. The user may wear a head mounted display on a portion of the user's head or face. The head mounted display may include a second camera. The second camera may photograph the face of the user. The head mounted display may generate a second image of the face of the user. The head mounted display may transmit the second image to the electronic device.

The electronic device may determine whether to detect the object (S802).

For example, the electronic device may determine whether to detect the head mounted display. If the head mounted display is not detected, the electronic device may proceed to detect the head mounted display in the first image. Alternatively, when the head mounted display is detected, the electronic device may skip the object detection step and proceed to track the object in the first image.

The electronic device may detect an object in a frame of the first image (S803).

For example, the electronic device may detect a head mounted display object worn by a user in a specific frame of the first image.

The electronic device may generate position and size information of the head mounted display. For example, the electronic device may generate location information including x-axis and y-axis coordinate information of the head mounted display. In addition, the electronic device may generate size information including size information on the x-axis and size information on the y-axis of the head mounted display.

The electronic device may perform a complicated calculation process when performing a full search for the user's face image in the first image or the second image. The electronic device may detect a user face area based on a user face detection algorithm to reduce a complicated calculation process. In addition, the electronic device may search for the coordinates of the head mounted display in the detected user face area. In this case, the electronic device may generate region-based convolutional neural network (CNN) training data based on the plurality of user facial images or image data included in the second image. The convolutional neural network may be a model of one of deep neural networks widely used in various applications such as object classification and object detection in an image. For example, the convolutional neural network may have a structure suitable for learning two-dimensional data. The convolutional neural network may generate trained learning data through a backpropagation algorithm.

The electronic device may track the object in the first image (S804).

When an object is detected in a specific frame of the first image, the electronic device may not detect the object in a frame after the specific frame. The electronic device may track the object based on the location information and the size information of the object detected in the specific frame. The electronic device may generate location information and size information about the tracked object.

The electronic device may generate an object model based on the position information and the size information of the first detected object, and sequentially track the object with respect to frames continuously input based on the object model. The electronic device may generate an error of tracking an object that matches the object model even if the object to be tracked does not exist in the input frame. The electronic device may generate location information and size information of the tracked object. The electronic device may perform an operation of checking whether the object exists in the input frame based on the tracked position information and the size information of the object.

The electronic device may track the direction of the object in the first image (S805).

The electronic device may track the direction of the object to determine whether the object exists in the input frame based on the tracked position information and the size information of the object. For example, if it is determined that the tracked object is the head mounted display, the electronic device may estimate the direction of the head mounted display, that is, the direction of the user's face. The electronic device may generate user face portion direction information based on a result of estimating the direction of the user face portion.

The direction of the user of the first image captured by the first camera may be changed according to a posture and an operation of the user. On the other hand, the direction of the face of the user captured by the second camera included in the head mounted display may be constant regardless of the posture and motion of the user because the head mounted display is fixed to the head or part of the face of the user. have.

8B is a flowchart illustrating an operation sequence of an electronic device for generating a composite image, according to an exemplary embodiment.

Referring to FIG. 8B, the electronic device may determine a third image corresponding to the direction of the face of the user in the user image database (S806).

The electronic device may determine, in the user image database, a third image having the same or similar direction as the face direction of the current user according to the x-axis, y-axis, and z-axis directions based on the user's face portion direction information. The electronic device may convert the direction of the second image to the direction of the first image based on the user facial part direction information.

The electronic device may generate illuminance information of the first to third images in operation S807.

The electronic device may determine the average illuminance of the first image. The electronic device may generate first image illuminance information based on the average illuminance of the first image. The electronic device may determine the average illuminance of the second image. The electronic device may generate second image illuminance information based on the average illuminance of the second image. In addition, the electronic device may determine an average illuminance of the third image. The electronic device may generate third image illuminance information based on the average illuminance of the third image.

The electronic device may change the illuminance of the second and third images to be the same as the illuminance of the first image (S808).

The electronic device may change the average illuminance of the second image to be the same as the average illuminance of the user face region of the first image. In addition, the electronic device may change the average illuminance of the third image to be the same as the average illuminance of the user face region of the first image.

The electronic device may correct the distortion of the second image (S809).

The second camera included in the head mounted display may use a wide angle lens for capturing a wide range with a relatively short focal length as compared to the first camera. The electronic device may perform image processing for removing distortion by the wide-angle lens on the second image having the changed illuminance. In addition, the electronic device may perform image processing to match the direction of the second image with the direction of the user's face part of the first image.

The electronic device may synthesize the first to third images (S810).

The electronic device may copy an image of the user face region from the third image. The electronic device may generate the first composite image by synthesizing the copied image of the user face region to the user face region of the first image.

The electronic device may copy a portion of the second image and synthesize the portion of the first composite image. For example, the electronic device may copy an image of the eye area of the user from the second image. The electronic device may synthesize the copied image of the eye region of the user from the first synthesized image to the eye region of the user.

The electronic device may include: a first camera generating a first image photographing a first user; A receiver configured to receive, from the second camera, a second image photographing a specific area of the first user not photographed by the first camera through a second camera worn by the first user; 3D processing the first image to generate a 3D image, detecting a region of the second camera from the first image, and synthesizing a region of the second camera from the first image with a portion of the 3D image. A processor configured to generate the synthesized image; And a transmitter for transmitting the synthesized image to another electronic device.

The electronic device may include a first camera configured to generate a first image signal photographing a first user; A second camera disposed on a part of the first user and generating a second image signal photographing another part of the first user; A third camera configured to generate a third image signal of the first user wearing the second camera; Based on the first image signal, a plurality of first user face image images corresponding to a plurality of predetermined angles and sizes of the face portion of the first user are generated, and the second camera region is detected from the third image signal. And detecting the second camera area in a specific frame included in the third image signal, estimating a front direction of the face part of the first user based on the first image signal and the second image signal, Determine a first user face image corresponding to a front direction of the face of the first user from among first user face images, and determine a copy image corresponding to the second camera region among the first user face images A processor configured to synthesize the radiated image with a second camera area of the third image signal to generate a synthesized image; And a transceiver configured to transmit the synthesized image to another electronic device.

According to an embodiment of the present disclosure, a first electronic device may include a first camera configured to generate a first image of a first user; A receiver configured to receive, from the second camera, a second image photographing a specific area of the first user not photographed by the first camera through a second camera worn by the first user; Detecting a face image of the first user from the first image, generating a third image which is a 3D stereoscopic image corresponding to the face of the first user based on the first image, and generating the third image and the A processor configured to generate a synthesized image obtained by synthesizing the second image; And a transmitter for transmitting the synthesized image to another electronic device.

In the first electronic device, the processor may generate location information and size information of the face part image of the first user from the first image, and the image of the part where the second camera is worn on the face part image of the first user. The second camera image may be detected, and location information and size information of the second camera image may be generated. The second camera image may be a virtual reality image.

The detecting of the face unit image of the first user may include generating location information and size information of the face unit image of the first user from the first image; Detecting a second camera image, which is an image of a part where the second camera is worn, from a face image of the first user; And generating location information and size information of the second camera image. The second camera image may be a virtual reality image.

The methods according to the invention can be implemented in the form of program instructions that can be executed by various computer means and recorded on a computer readable medium. Computer-readable media may include, alone or in combination with the program instructions, data files, data structures, and the like. The program instructions recorded on the computer readable medium may be those specially designed and constructed for the present invention, or may be known and available to those skilled in computer software.

Examples of computer readable media include hardware devices that are specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code, such as produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate with at least one software module to perform the operations of the present invention, and vice versa.

Although described with reference to the embodiments above, those skilled in the art will understand that the present invention can be variously modified and changed without departing from the spirit and scope of the invention as set forth in the claims below. Could be.

Claims

In the first electronic device,
A first camera generating a first image of the first user;
A receiver configured to receive, from the second camera, a second image photographing a specific area of the first user not photographed by the first camera through a second camera worn by the first user;
Detecting a face image of the first user from the first image, generating a third image which is a 3D stereoscopic image corresponding to the face of the first user based on the first image, and generating the third image and the A processor configured to generate a synthesized image obtained by synthesizing the second image; And
And a transmitter configured to transmit the composite image to another electronic device.
The processor generates position information and size information of the face part image of the first user from the first image, and generates a second camera image which is an image of a portion where the second camera is worn on the face part image of the first user. And detecting location information and generating location information and size information of the second camera image.

delete

The method according to claim 1,
The processor generates a face image model of the first user based on position information and size information of the face image of the first user, and generates a frame of the first image based on the face image image of the first user. And determining whether a face image of the first user exists and generating face direction information of the first user based on the face image model of the first user.

The method according to claim 3,
The processor generates the second camera image model based on location information and size information of the second camera image.
And determining whether the second camera image exists in a frame of the first image based on the second camera image model.

The method according to claim 4,
The processor generates 3D image information on the face part of the first user based on the first image, and the first user based on direction information and the 3D image information of the face part image of the first user. The first electronic device generates the third image corresponding to the direction of the face part of the face.

The method according to claim 5,
The processor is configured to generate first image illuminance information on an average illuminance of the first image, generate second image illuminance information on an average illuminance of the second image, and generate a second image illuminant on the average illuminance of the third image. The first electronic device to generate 3 image illuminance information.

The method according to claim 6,
The processor is further configured to change the second image illuminance information and the third image illuminance information to correspond to the first image illuminance information based on the first image illuminance information.

The method according to claim 7,
The processor may be configured to correct distortion of the second image in which illuminance information is changed, and to change the direction of the second image to correspond to the direction of the face part of the first user based on the direction information of the face part image of the first user. , First electronic device.

The method according to claim 8,
The processor generates a modified image in which the face image of the first user is changed to the third image, and includes position information, size information and direction information of the face image of the first user, and position information of the second camera image. And generating the composite image by combining the changed image and the second image based on direction information.

The method according to claim 9,
The processor is configured to monitor a face image of the first user in the first image and to update the composite image based on the monitoring result.

In the operating method of the first electronic device,
Generating a first image of the first user by using the first camera;
Receiving, from the second camera, a second image photographing a specific area of the first user not photographed by the first camera through a second camera worn by the first user;
Detecting a face image of the first user from the first image;
Generating a third image, which is a 3D stereoscopic image, corresponding to the face part of the first user based on the first image;
Generating a synthesized image obtained by synthesizing the third image and the second image; And
Transmitting the composite image to another electronic device;
The detecting of the face part image of the first user may include:
Generating location information and size information of a face image of the first user from the first image;
Detecting a second camera image, which is an image of a part where the second camera is worn, from a face image of the first user; And
Generating location information and size information of the second camera image.

delete

The method according to claim 11,
The detecting of the face part image of the first user may include:
Generating a face image model of the first user based on location information and size information of the face image of the first user;
Determining whether a face image of the first user exists in a frame of the first image based on the face image model of the first user; And
And generating facial part direction information of the first user based on the facial part image model of the first user.

The method according to claim 13,
The detecting of the face part image of the first user may include:
Generating the second camera image model based on location information and size information of the second camera image; And
And determining whether the second camera image exists in a frame of the first image based on the second camera image model.

The method according to claim 14,
Generating the third image,
Generating 3D image information on the face part of the first user based on the first image; And
And generating the third image corresponding to the direction of the face part of the first user based on the direction information of the face part image of the first user and the three-dimensional image information.

The method according to claim 15,
Generating the third image,
Generating first image illumination information regarding an average illumination of the first image;
Generating second image illuminance information regarding an average illuminance of the second image; And
Generating third image illuminance information regarding the average illuminance of the third image.

The method according to claim 16,
Generating the third image,
And changing the second image illuminance information and the third image illuminance information to correspond to the first image illuminance information based on the first image illuminance information. .

The method according to claim 17,
Generating the third image,
Correcting distortion of the second image in which illuminance information is changed; And
And changing the direction of the second image to correspond to the direction of the face part of the first user based on the direction information of the face part image of the first user.

The method according to claim 18,
Generating the composite image,
Generating a changed image in which the face image of the first user is changed to the third image;
Generating the synthesized image by combining the changed image and the second image based on the position information, the size information and the orientation information of the face image of the first user, and the position information and the orientation information of the second camera image; And operating the first electronic device.

The method according to claim 19,
Generating the composite image,
Monitoring a face image of the first user in the first image; And
And updating the synthesized image based on the monitoring result.