WO2020262725A1 - 딥 러닝을 이용한 3차원 물체 인식에 기초하여 증강 현실 정보를 제공하는 방법 및 이를 이용한 시스템 - Google Patents
딥 러닝을 이용한 3차원 물체 인식에 기초하여 증강 현실 정보를 제공하는 방법 및 이를 이용한 시스템 Download PDFInfo
- Publication number
- WO2020262725A1 WO2020262725A1 PCT/KR2019/007656 KR2019007656W WO2020262725A1 WO 2020262725 A1 WO2020262725 A1 WO 2020262725A1 KR 2019007656 W KR2019007656 W KR 2019007656W WO 2020262725 A1 WO2020262725 A1 WO 2020262725A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- augmented reality
- learning
- image
- neural network
- providing system
- Prior art date
Links
- 230000003190 augmentative effect Effects 0.000 title claims abstract description 94
- 238000000034 method Methods 0.000 title claims description 23
- 238000013135 deep learning Methods 0.000 title description 21
- 238000013528 artificial neural network Methods 0.000 claims description 41
- 238000004891 communication Methods 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 15
- 230000008859 change Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 9
- 239000000470 constituent Substances 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000008685 targeting Effects 0.000 description 3
- 238000005406 washing Methods 0.000 description 3
- 230000010267 cellular communication Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/006—Mixed reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/20—Finite element generation, e.g. wire-frame surface description, tesselation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/20—Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present invention relates to a method of providing information using augmented reality based on 3D object recognition and a system using the same. More specifically, the present invention implements a user-customized augmented reality authoring tool to overlay related information on the feature area of a desired object, and recognizes 3D objects through light calculations in the user device through learning using deep learning in the server. This is about a possible system.
- AR Augmented Reality
- the augmented reality technology currently used in the industry is based on the 2D image recognition technology, and it is implemented based on the technology that recognizes 2D images and patterns, so its use is limited, and the augmented reality experience can be provided only from a specific angle.
- the side or the back of the object cannot be used for augmented reality. Therefore, in order to apply realistic interactions with objects in the real world and realize augmented reality that changes according to the movement of objects in the real world, the type, size, position, and direction of objects in the real world are recognized and It is possible to use all of the front, side, and back, and it is necessary to realize augmented reality that is more realistic and realistic.
- An object of the present invention is to provide an augmented reality authoring tool that allows a user to add, modify, and delete augmented reality content in real time, thereby enabling augmented reality to be realized more easily and at low cost.
- the present invention recognizes the type, size, and location of objects in a three-dimensional space using a convolutional neural network (CNN), which is the latest object recognition technology, and provides a realistic and dynamic augmented reality to be utilized throughout the industry. It aims to make it possible.
- CNN convolutional neural network
- the present invention recognizes an object without a marker through the depth recognition camera of the user device, and provides augmented reality information from all angles such as the side and the rear of the object, and even when the product is rotated, augmented reality information It is an object of the present invention to provide an apparatus and method capable of providing.
- an object of the present invention is to provide a system for learning through deep learning using GPU resources and abundant data through a server, and transmitting only the learning result to a user device, thereby enabling object recognition with light computation.
- the present invention provides a system capable of efficiently using the resources used for deep learning by dividing the common use part and individual use part for each user in the configuration of an artificial neural network for deep learning learning to reduce the burden of separately learning for each user. It aims to do.
- an information providing system using augmented reality based on 3D object recognition comprising: a user device for authoring augmented reality including a camera and configured to photograph an object at various angles; And it is possible to provide an augmented reality information providing system including a server configured to perform learning for object recognition from the image of the object.
- the camera of the user device may be a depth camera.
- the user device may receive a plurality of RGB photographed images at various angles including the front, side, and rear surfaces of the object, and may perform 3D mesh reconstruction based on the plurality of RGB photographed images.
- the user device may generate a feature region related to at least a partial region of the object and associate at least one of a text, a guide line, an image, and a video related to the feature region.
- the server performs learning through an artificial neural network, and learning through the artificial neural network includes a first learning to find a feature point from an image and a second learning to find a type of an object from the feature point. can do.
- the artificial neural network includes a low-rise part and a high-rise part, and the low-layer part of the artificial neural network performs the first learning using all images performed from a plurality of user devices, and the high-rise part of the artificial neural network is an object of each user device.
- the second learning can be performed only for only.
- a customer device configured to receive information based on augmented reality, configured to be capable of photographing an object, further comprises a photographed image and an original image photographed at various angles in advance by the user device.
- An augmented reality information providing system capable of recognizing at least one of a location, a direction, and a feature region of an object through image matching using an invariant feature transform) may be provided.
- the customer device is configured to render at least one of text, guide lines, images, and videos related to the feature area on the captured image, and recognizes the location of the customer device through a simulation localization and mapping (SLAM) algorithm.
- SLAM simulation localization and mapping
- the customer device may continuously track a change in the position of the customer device and at the same time continuously track the position and direction of the object.
- a user device for authoring an augmented reality comprising: a camera capable of photographing objects at various angles; A processing unit configured to generate a feature region related to at least a partial region of the object based on a user input and associate at least one of a text, a guide line, an image, and a video related to the feature region; And a communication unit for data communication, wherein the communication unit transmits the image of the object and information related to the feature region to a server configured to perform learning from the image of the object through a neural network.
- An augmented reality information providing system may be provided.
- augmented reality can be implemented more easily and at low cost by providing an augmented reality authoring tool through which a user can add, modify, and delete augmented reality content in real time.
- a convolutional neural network which is a state-of-the-art object recognition technology, recognizes the type, size, and location of objects in a three-dimensional space, and provides a realistic and dynamic augmented reality to be utilized throughout the industry. It is possible.
- a system that can reduce the burden of separately learning for each user by dividing the common use part and individual use part for each user and efficiently use the resources used for deep learning. Can provide.
- FIG. 1 is a block diagram illustrating a configuration of a user device for authoring an augmented reality according to an embodiment of the present invention.
- FIG. 2 is a flowchart illustrating an augmented reality authoring method according to an embodiment of the present invention.
- FIG. 3 is an exemplary diagram for explaining a deep learning learning method of a server according to an embodiment of the present invention.
- FIG. 4 is an exemplary system diagram for describing a process of authoring augmented reality, learning an artificial neural network, and experiencing an augmented reality according to an embodiment of the present invention.
- 5A to 5C are exemplary views illustrating a method of authoring an augmented reality according to an embodiment of the present invention.
- 6A to 6C are exemplary diagrams for explaining a process of experiencing an augmented reality according to an embodiment of the present invention.
- FIG. 7 is an exemplary diagram illustrating an object recognition algorithm through deep learning using an artificial neural network according to an embodiment of the present invention.
- each component shown in the embodiments of the present invention are shown independently to represent different characteristic functions, and it does not mean that each component is formed of separate hardware or a single software component. That is, each constituent unit is described as being listed as a respective constituent unit for convenience of explanation, and at least two constituent units of each constituent unit are combined to form one constituent unit, or one constituent unit may be divided into a plurality of constituent units to perform a function. Integrated embodiments and separate embodiments of each of these components are also included in the scope of the present invention unless departing from the essence of the present invention.
- FIG. 1 is a block diagram illustrating a configuration of a user device 100 for authoring an augmented reality according to an embodiment of the present invention.
- the user device 100 for creating augmented reality may include a communication unit 110, an input receiving unit 120, a display unit 130, a camera unit 140, a memory unit 150, a processing unit 160, and the like. , Is not limited to these components.
- a user device 100 is a terminal capable of receiving, processing, and displaying various data, etc.
- a smart phone for example, a smart phone, a tablet ) Computer, desktop computer, laptop computer, notebook, workstation, PDA (Personal Digital Assistants), portable computer, wireless phone, mobile phone, It may be any one of an e-book, a portable multimedia player (PMP), a portable game console, a digital camera, a television, a wearable device, and an artificial intelligence (AI) speaker. , Not limited to these.
- PMP portable multimedia player
- AI artificial intelligence
- the communication unit 110 may be a module or part configured to receive information necessary for the augmented reality authoring from a server 200 (not shown) or an external device through a network, or to transmit the obtained information to a server or an external device
- the network is a network connected by wire or wirelessly, and when the network is a wireless communication network, cellular communication or short-range communication may be included.
- cellular communication is LTE (Long-Term Evolution), LTE-A (LTE Advanced), 5G (5th Generation), CDMA (Code Division Multiple Access), WCDMA (Wideband CDMA), UMTS (Universal Mobile Telecommunications System), WiBro It may include at least one of (Wireless Broadband), or Global System for Mobile Communications (GSM).
- short-range communication may include at least one such as Wireless Fidelity (Wi-Fi), Bluetooth, Zigbee, or Near Field Communication (NFC).
- Wi-Fi Wireless Fidelity
- Bluetooth Zigbee
- NFC Near Field Communication
- the communication method is not limited thereto, and a wireless communication technology to be developed later will be included.
- the input receiving unit 120 is a component for receiving a user's input, and in order to receive various inputs for manipulation and selection by the user, a touch screen, a touch pad, and a touch panel ), a key pad, a dome switch, a physical button, a jog shuttle, a microphone, and an input unit composed of at least one of a sensor, limited to these It doesn't work.
- the display unit 130 is a component for visually providing information for authoring augmented reality and information about photographing a camera to a user.
- the display unit 130 includes a liquid crystal display (LCD), a light emitting diode (LED) display, an organic light emitting diode (OLED) display, a micro LED, a micro electromechanical system (MEMS), micro electro mechanical systems) displays and electronic paper displays, but are not limited thereto.
- LCD liquid crystal display
- LED light emitting diode
- OLED organic light emitting diode
- MEMS micro electromechanical system
- micro electro mechanical systems electronic paper displays
- the camera unit 140 is configured to allow a user to photograph an object to be augmented reality content from various angles, and may include, for example, a stereo camera, an infrared (IR) camera, a laser scanner, and the like. It may be a depth recognition camera or a depth camera capable of generating depth information, and may include various types of sensors and modules capable of obtaining depth information. In this way, by using depth information acquired by the camera unit 140, the size of an object can be easily recognized at low cost and with high reliability, and accuracy of object recognition can be improved.
- IR infrared
- the memory unit 150 may store data received or generated by the user device 100. In addition, commands or data related to at least one other component may be stored. For example, it may be a hard disk drive (HDD), a solid state drive (SSD), a flash memory, or any other suitable storage device including dynamic memory such as SRAM and DRAM. Camera photographing information and various information necessary for authoring augmented reality may be stored in the memory unit 150. The data stored in the memory unit 150 may be accessed and read by the processing unit 160 as needed.
- HDD hard disk drive
- SSD solid state drive
- flash memory or any other suitable storage device including dynamic memory such as SRAM and DRAM.
- Camera photographing information and various information necessary for authoring augmented reality may be stored in the memory unit 150.
- the data stored in the memory unit 150 may be accessed and read by the processing unit 160 as needed.
- the processing unit 160 generates a 3D model from the captured object images, correlates information to be provided in augmented reality (AR) or mixed reality (MR) to the feature area of the object, and provides various augmented reality. It is configured to input and edit information provided by reality or mixed reality.
- the processing unit 160 may include a central processing unit (CPU), an application processor (AP), and the like, and include a memory capable of storing instructions or data related to at least one other component, or The memory unit 150 or, if necessary, communicates with an external memory to access necessary information.
- the 3D model generation unit 161, the feature region information generation unit 162, and the editing processing unit 163 constituting the processing unit 160 may include programs or program modules that can be executed by one or more processors. have.
- Programs or program modules included in the processing unit 160 may be configured in the form of an operating system (OS), an application program, or a program, and physically on various types of storage devices that are widely used. Can be saved.
- Such programs or program modules include one or more routines, subroutines, programs, objects, components, instructions, data structures, and specific tasks ( task) or to execute a specific data type, but is not limited to these types.
- the 3D model generation unit 161 receives a plurality of RGB photographed images photographed at various angles from various sides of the object photographed by the camera unit 140, that is, the front, side, and rear, and based on the received image. It is configured to create a three-dimensional (3D) model.
- the 3D model generation unit 161 may generate 3D mesh data from a plurality of images continuously photographed at various angles with respect to the object, through which 3D mesh reconstruction. It is possible to perform The generated 3D mesh information and a plurality of RGB photographed images are transmitted to the server 200, and deep learning learning may be performed in the server based on the 3D mesh information and the photographed image.
- the feature region information generation unit 162 generates a feature region related to at least a partial region of the object by the user, and associates information related to the feature region designated by the user, such as at least one of text, guide lines, images, and videos.
- the user can generate and provide information related to the feature area of the object to be provided in augmented reality or mixed reality as various visual elements, and designate the feature area at various angles such as the side and the rear of the object, and , Additional information related to the corresponding area may be provided.
- the editing processing unit 163 is configured to input, edit, and update information related to a feature region set in various ways such as text, guide lines, images, and videos.
- augmented reality authoring can be implemented more easily and at low cost through a program or application of the user device 100 for authoring augmented reality, and the augmented reality created by the user device 100
- the reality content can be experienced and experienced through a related augmented reality use/experience program or app on a customer terminal or device.
- FIG. 2 is a flowchart illustrating an augmented reality authoring method according to an embodiment of the present invention.
- a user for the augmented reality authoring may take an object image of various angles that may include the front, side, and rear of an object to be augmented reality content through the camera unit 140 of the user device 100. .(S210)
- the 3D model generation unit 161 of the user device 100 may obtain 3D model information by receiving images of various angles of the object and generating a 3D mesh based thereon (S220).
- the user may designate a feature area, which is an area requiring guidance information or additional information of a corresponding object, and input related information, and the feature area information generation unit 162 generates and processes the feature area designated and input by the user,
- Information to be provided in augmented reality such as text, guide lines, images, and videos corresponding to the corresponding feature area may be created and stored in association (S230).
- the user may edit and update the augmented reality information, which is information to be provided to the augmented reality (S240).
- the edited and updated information may also be transmitted to the server 200 and updated together.
- FIG. 3 is an exemplary diagram for explaining a deep learning learning method of a server according to an embodiment of the present invention.
- Deep learning is a type of machine learning modeled on an artificial neural network, which allows artificial neural networks to be stacked in deep layers.
- a convolutional neural network which is mainly used for image processing
- a convolutional neural network is a deep artificial neural network in which convolutional and pooling layers are successively stacked. As, it can be used to recognize images, identify objects from them, and find important information.
- the server 200 includes a first learning step (S310) for finding a feature point from an image and a second learning step (S320) for finding a type of an object from the feature point in performing learning through an artificial neural network. can do.
- the artificial neural network includes a low-layer part and a high-rise part, and the low-layer part of the artificial neural network performs a first learning step (S310) for finding a feature point from an image using all images performed by a plurality of user devices, and the high-rise part of the artificial neural network
- a second learning step (S320) of finding the type of the object from the feature point may be performed by targeting only the object of each user device.
- the low-rise data is fixed and only the high-rise is changed.
- the data of each other are shared with each other, and the high-rise data is different for each user.
- To learn an artificial neural network a sufficient amount of data is required. After each user first learns the neural network by using images of various objects taken by multiple users at the same time to recognize objects from a small number of images taken by each user, each user If the neural network is further trained on only the objects desired by the user, an artificial neural network suitable for each user can be constructed.
- This embodiment secures generality by sharing some neural networks through the stepwise processing of the low-rise part and the high-rise part, and enables a more customized response to the object of a specific user. Therefore, when learning different artificial neural networks for each user, the problem of overfitting (overfitting) in which other data other than the user's input data is not well accepted, and when classifying all users into one neural network, respond to all objects. You can solve the problem of getting difficult.
- it is possible to provide a smarter solution by providing a server to users and continuously increasing learning data in a database.
- FIG. 4 is an exemplary system diagram for describing a process of authoring augmented reality, learning an artificial neural network, and experiencing an augmented reality according to an embodiment of the present invention.
- the customer can author and create augmented reality content related to a desired object through an augmented reality authoring/editing app.
- customer A may author augmented reality content related to a first object using the user device 100
- customer B may create augmented reality content related to a second object different from the first object. You can write by using.
- the image captured by the customer A and the customer B for the first object and the second object, respectively, is transmitted to the server 200 to perform deep learning learning in the server 200 through an artificial neural network.
- the lower layer of the artificial neural network may perform first learning to find a feature point from an image using a plurality of user devices, that is, all images captured from the customer company A and the customer company B.
- the database (DB) is classified by each customer, and learning in DB2 targeting only the objects of customer A and learning in DB1 only targeting the objects of customer B can be performed separately.
- Second learning may be performed to find the type of the object from each feature point.
- the server 200 may be configured as a GPU (Graphics Processing Unit) server for deep learning, and due to the nature of the artificial neural network, the learning time is very long, but the object recognition execution time is not so long, so when the learning in the server 200 is finished, Object recognition is sufficiently possible in a short time in a customer terminal such as a smartphone.
- the server 200 can quickly find out the type of object from the image through deep learning using the GPU, but the performance of the mobile device is not suitable for directly performing deep learning, so the server 200 provides GPU resources and abundant images. By performing learning through deep learning using data and transmitting only the learning result to the customer terminal, it is possible to quickly recognize objects in the customer terminal.
- GPU Graphics Processing Unit
- the customer photographs the object through the customer terminal or the customer device 300, and provides content related to the object in augmented reality or mixed reality. It can be realized through an augmented reality use/experience app of the device 300.
- the position and direction of the object are determined by matching the image of the camera with the original image of the object previously photographed from multiple angles, and the features specified by the user You can find areas.
- the customer device determines at least one of the location, direction, and feature area of the object through image matching using a scale-invariant feature transform (SIFT) between the captured image and the original image previously captured at various angles by the user device. I can recognize it.
- SIFT scale-invariant feature transform
- the customer device 300 may provide augmented reality information provided by rendering a virtual object on the 3D coordinates of a real object being photographed.
- the augmented reality technology continuously tracks the location of the current customer device 300.
- SLAM Simultaneous Localization And Mapping
- the location of the customer device 300 can be obtained from a sequence of consecutive images viewed by the user.
- the user's position is a relative position to the user's position obtained for the first time, and the distance and direction moved by holding the camera can be identified.
- the virtual object is rendered and the above deep learning recognition process is repeated to continuously track the position and direction of the real object.
- the user's movement can be tracked simultaneously.
- 5A to 5C are exemplary views illustrating a method of authoring an augmented reality according to an embodiment of the present invention.
- a user of a customer company uses the camera unit 140 of the user device 100 for creating an augmented reality to photograph an object, such as the front, side, and rear of the washing machine, and continuously capture images from various angles. Can be obtained.
- 3D mesh data and RGB image data can be obtained from a plurality of images photographed at various angles, and information to be provided in augmented reality by a user of a customer company is provided by the user device 100.
- the input receiver 120 may be used to set a feature area within a corresponding object and input additional information such as a guide line, text, image, and video related to the feature area.
- additional information such as a guide line, text, image, and video related to a corresponding feature area of an object is displayed, and additional information provided in augmented reality or mixed reality may be input and edited in various ways as described above.
- 6A to 6C are exemplary diagrams for explaining a process of experiencing an augmented reality according to an embodiment of the present invention.
- a customer experiencing an augmented reality may photograph a corresponding object using a camera through the customer terminal 300, and at this time, the user may photograph the object at various angles such as the side, the rear, etc.
- the augmented reality experience app not only object recognition from various angles, but also additional information related to the feature area can be continuously displayed through the customer terminal.
- product information may be displayed in augmented reality by being overlaid on the washing machine being photographed by the camera.
- the displayed product information is input content of augmented reality content previously authored by a customer company, that is, a producer or seller of the product through an augmented reality authoring app.
- a feature point and a feature region of the object may be continuously tracked to display the corresponding information. It can be seen that depending on the angle of the camera of the customer's terminal, additional information related to the characteristic areas of the side and rear as well as the front of the object can be displayed.
- FIG. 7 is an exemplary diagram illustrating an object recognition algorithm through deep learning using an artificial neural network according to an embodiment of the present invention.
- the present invention it is possible to find out what object is currently being viewed by the camera from the input data by using an artificial neural network that determines the type of object by using a picture taken through a camera of a customer terminal as input data. Due to the nature of the artificial neural network, the learning time in the server 200 is very long, but the time for capturing an object and performing object recognition performed in real time is not very long, so the object recognition performed once the image learning in the server 200 is finished is smart. Even in a customer terminal such as a phone, it is possible in a short time.
- the image of the camera and the original image of the object previously photographed from multiple angles are matched (S730) to determine the position and direction of the object (S740), and the feature regions designated by the user are found.
- the related information can be displayed (S750), and a scale-invariant feature transform (SIFT) is used for image matching, and through this, the image can be matched regardless of the size of the object, and the size of the object can also be recognized.
- SIFT scale-invariant feature transform
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Graphics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Human Resources & Organizations (AREA)
- Computing Systems (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Architecture (AREA)
- Geometry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Economics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- User Interface Of Digital Computer (AREA)
- Image Analysis (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
Claims (10)
- 3차원 물체 인식에 기초하여 증강 현실을 이용한 정보 제공 시스템에 있어서,카메라를 포함하고, 물체를 다양한 각도로 촬영하도록 구성된, 증강 현실 저작을 위한 사용자 장치; 및상기 물체의 이미지로부터 물체 인식을 위한 학습을 수행하도록 구성된 서버를 포함하는 증강 현실 정보 제공 시스템.
- 제1항에 있어서, 상기 사용자 장치의 카메라는 깊이 카메라(depth camera)인 것인, 증강 현실 정보 제공 시스템.
- 제1항에 있어서, 상기 사용자 장치는 상기 물체의 정면, 측면 및 후면을 포함한 다양한 각도의 복수의 RGB 촬영 이미지를 수신하고, 상기 복수의 RGB 촬영 이미지에 기초하여 3차원 메쉬 재생성(3D mesh reconstruction)을 수행가능한 것인, 증강 현실 정보 제공 시스템.
- 제1항에 있어서, 상기 사용자 장치는 상기 물체의 적어도 일부 영역과 관련된 특징 영역을 생성하고, 상기 특징 영역과 관련된 텍스트, 안내선, 이미지 및 동영상 중 적어도 하나를 연관시킬 수 있는 것인, 증강 현실 정보 제공 시스템.
- 제1항에 있어서, 상기 서버는 인공 신경망(neural network)을 통해 학습을 수행하고, 상기 인공 신경망을 통한 학습은 이미지로부터 특징점을 찾는 제1 학습과 상기 특징점으로부터 물체(object)의 종류를 찾는 제2 학습을 포함하는 것인, 증강 현실 정보 제공 시스템.
- 제5항에 있어서, 상기 인공 신경망은 저층부 및 고층부를 포함하고, 상기 인공 신경망의 저층부는 복수의 사용자 장치로부터 수행된 모든 이미지를 이용하여 상기 제1 학습을 진행하고, 상기 인공 신경망의 고층부는 각 사용자 장치의 물체만을 대상으로 상기 제2 학습을 진행하는 것인, 증강 현실 정보 제공 시스템.
- 제4항에 있어서, 물체를 촬영 가능하도록 구성된, 증강 현실 기반의 정보를 제공받도록 구성된 고객 장치를 더 포함하고, 상기 고객 장치는 촬영한 이미지와 상기 사용자 장치에서 미리 다양한 각도로 촬영된 원본 이미지를 SIFT(Scale-invariant feature transform)를 이용한 이미지 매칭을 통해 물체의 위치, 방향 및 특징 영역 중 적어도 하나를 인식할 수 있는 것인, 증강 현실 정보 제공 시스템.
- 제7항에 있어서, 상기 고객 장치는 상기 촬영한 이미지에 상기 특징 영역과 관련된 텍스트, 안내선, 이미지 및 동영상 중 적어도 하나를 렌더링하도록 구성되고, SLAM(simultaneous localization and mapping) 알고리즘을 통해 상기 고객 장치의 위치를 인식할 수 있는 것인, 증강 현실 정보 제공 시스템.
- 제8항에 있어서, 상기 고객 장치는 상기 고객 장치의 위치의 변화를 지속적으로 추적하는 동시에, 상기 물체의 위치 및 방향을 지속적으로 추적하는 것인, 증강 현실 정보 제공 시스템.
- 증강 현실 저작을 위한 사용자 장치에 있어서,물체를 다양한 각도로 촬영가능한 카메라;사용자 입력에 기초하여 상기 물체의 적어도 일부 영역과 관련된 특징 영역을 생성하고, 상기 특징 영역과 관련된 텍스트, 안내선, 이미지, 동영상 중 적어도 하나를 연관시키도록 구성된 처리부; 및데이터 통신을 위한 통신부를 포함하고,상기 통신부를 통해 상기 물체의 이미지로부터 인공 신경망(neural network)을 통해 학습을 수행하도록 구성된 서버에 상기 물체의 이미지 및 상기 특징 영역과 관련된 정보를 전송하는 것인, 증강 현실 정보 제공 시스템.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020190075575A KR102186821B1 (ko) | 2019-06-25 | 2019-06-25 | 딥 러닝을 이용한 3차원 물체 인식에 기초하여 증강 현실 정보를 제공하는 방법 및 이를 이용한 시스템 |
KR10-2019-0075575 | 2019-06-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020262725A1 true WO2020262725A1 (ko) | 2020-12-30 |
Family
ID=73776714
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2019/007656 WO2020262725A1 (ko) | 2019-06-25 | 2019-06-25 | 딥 러닝을 이용한 3차원 물체 인식에 기초하여 증강 현실 정보를 제공하는 방법 및 이를 이용한 시스템 |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR102186821B1 (ko) |
WO (1) | WO2020262725A1 (ko) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102643766B1 (ko) * | 2021-02-09 | 2024-03-06 | 주식회사 큐에스 | 기계학습 및 증강현실 기술 기반의 객체 정보 표시 방법 및 객체 정보 표시 시스템 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20090044702A (ko) * | 2007-11-01 | 2009-05-07 | 광주과학기술원 | 증강현실을 이용한 제품정보 제공 시스템 및 그 제공방법 |
US20170286901A1 (en) * | 2016-03-29 | 2017-10-05 | Bossa Nova Robotics Ip, Inc. | System and Method for Locating, Identifying and Counting Items |
KR101887081B1 (ko) * | 2017-05-08 | 2018-08-13 | 주식회사 브리즘 | 증강 현실 콘텐츠 서비스를 제공하는 방법 |
KR101898075B1 (ko) * | 2017-12-29 | 2018-09-12 | 주식회사 버넥트 | 공간 인식 및 사물 인식이 동시에 적용된 증강현실 시스템 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101843585B1 (ko) | 2011-12-16 | 2018-03-30 | 주식회사 엘지유플러스 | 사물 인식을 통한 서비스 서버 및 방법 |
KR101506610B1 (ko) * | 2013-04-29 | 2015-03-27 | 주식회사 제이엠랩 | 증강현실 제공 장치 및 그 방법 |
KR102031670B1 (ko) * | 2017-11-17 | 2019-10-14 | 주식회사 코이노 | 휴대단말과 원격관리장치 및 이를 이용한 증강현실 기반 원격 가이던스 방법 |
-
2019
- 2019-06-25 KR KR1020190075575A patent/KR102186821B1/ko active IP Right Grant
- 2019-06-25 WO PCT/KR2019/007656 patent/WO2020262725A1/ko active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20090044702A (ko) * | 2007-11-01 | 2009-05-07 | 광주과학기술원 | 증강현실을 이용한 제품정보 제공 시스템 및 그 제공방법 |
US20170286901A1 (en) * | 2016-03-29 | 2017-10-05 | Bossa Nova Robotics Ip, Inc. | System and Method for Locating, Identifying and Counting Items |
KR101887081B1 (ko) * | 2017-05-08 | 2018-08-13 | 주식회사 브리즘 | 증강 현실 콘텐츠 서비스를 제공하는 방법 |
KR101898075B1 (ko) * | 2017-12-29 | 2018-09-12 | 주식회사 버넥트 | 공간 인식 및 사물 인식이 동시에 적용된 증강현실 시스템 |
Non-Patent Citations (1)
Title |
---|
GONG CHENG: "A Survey on Object Detection in Optical Remote Sensing Images", ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, vol. 117, 31 July 2016 (2016-07-31), pages 1 - 32, XP029539337, DOI: https://doi.org/10.1016/j.isprsjprs. 2016.03.01 4 * |
Also Published As
Publication number | Publication date |
---|---|
KR102186821B1 (ko) | 2020-12-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104871214B (zh) | 用于具扩增实境能力的装置的用户接口 | |
US20220237812A1 (en) | Item display method, apparatus, and device, and storage medium | |
WO2020017890A1 (en) | System and method for 3d association of detected objects | |
CN107566793A (zh) | 用于远程协助的方法、装置、系统及电子设备 | |
WO2019017582A1 (ko) | 클라우드 소싱 기반의 ar 컨텐츠 템플릿을 수집하여 ar 컨텐츠를 자동으로 생성하는 방법 및 시스템 | |
US10902683B2 (en) | Representation of user position, movement, and gaze in mixed reality space | |
US20220358662A1 (en) | Image generation method and device | |
KR102430029B1 (ko) | 딥러닝 기반 유사상품 검색 결과 제공 방법 및 그 시스템 | |
CN110555102A (zh) | 媒体标题识别方法、装置及存储介质 | |
CN110111241A (zh) | 用于生成动态图像的方法和装置 | |
KR102466978B1 (ko) | 딥러닝 기반 가상 이미지 생성방법 및 시스템 | |
WO2020262725A1 (ko) | 딥 러닝을 이용한 3차원 물체 인식에 기초하여 증강 현실 정보를 제공하는 방법 및 이를 이용한 시스템 | |
CN112270242B (zh) | 轨迹的显示方法、装置、可读介质和电子设备 | |
CN112037305B (zh) | 对图像中的树状组织进行重建的方法、设备及存储介质 | |
US11683453B2 (en) | Overlaying metadata on video streams on demand for intelligent video analysis | |
CN113822263A (zh) | 图像标注方法、装置、计算机设备及存储介质 | |
US20190378335A1 (en) | Viewer position coordination in simulated reality | |
KR20220041319A (ko) | 딥러닝 기반 상품검색 방법 및 시스템 | |
JP2017182681A (ja) | 画像処理システム、情報処理装置、プログラム | |
CN115223248A (zh) | 手部姿态识别方法、手部姿态识别模型的训练方法及装置 | |
CN108305210B (zh) | 数据处理方法、装置及存储介质 | |
WO2020175760A1 (en) | Electronic device and content generation method | |
WO2020017668A1 (ko) | 다시점 영상 정합을 이용한 아바타 생성 방법 및 장치 | |
WO2018124678A1 (en) | Electronic device and operation method thereof | |
WO2024144261A1 (en) | Method and electronic device for extended reality |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19934558 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19934558 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29-04-2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19934558 Country of ref document: EP Kind code of ref document: A1 |