KR102462934B1

KR102462934B1 - Video analysis system for digital twin technology

Info

Publication number: KR102462934B1
Application number: KR1020200022743A
Authority: KR
Inventors: 장용준
Original assignee: 제주한라대학교산학협력단
Priority date: 2020-02-25
Filing date: 2020-02-25
Publication date: 2022-11-03
Also published as: KR20210108044A

Abstract

본 발명은 실제 대상을 가상의 공간에 구성하여 시뮬레이션을 수행하는 디지털 트윈 기술에 적용할 수 있는 가상의 인간 객체를 구현하기 위하여, 영상 분석을 기반으로 인간의 움직임을 추적하면서 그 자세에 대한 정보까지 수집하고, 이를 토대로 디지털 트윈 기술에 적용할 수 있는 가상의 인간 객체의 움직임과 자세를 생성하도록 한 디지털 트윈 기술을 위한 영상 분석 시스템에 관한 것으로서, 실제 영상을 분석하여 영상 내 객체들의 이동 경로와 스켈레톤 정보 수준의 자세 정보를 함께 저장한 후 이러한 저장 객체 정보의 상황별 분포를 기준으로 이동 경로와 자세 정보를 가지는 가상 객체들을 생성할 수 있도록 함으로써 디지털 트윈 기반 시뮬레이션에 현실에 근접한 가상적 인간 객체들의 움직임을 제공할 수 있어 디지털 트윈 기반 시뮬레이션의 품질을 높일 수 있는 효과가 있다.In order to implement a virtual human object that can be applied to a digital twin technology that performs simulation by composing a real object in a virtual space, the present invention tracks human movement based on image analysis and provides information on its posture. It relates to an image analysis system for digital twin technology that collects and generates movements and postures of virtual human objects that can be applied to digital twin technology based on this, and analyzes real images to determine movement paths and skeletons of objects in images After storing information-level posture information together, it is possible to create virtual objects with movement paths and posture information based on the situational distribution of the stored object information. This has the effect of improving the quality of digital twin-based simulations.

Description

Video analysis system for digital twin technology

본 발명은 디지털 트윈 기술을 위한 영상 분석 시스템에 관한 것으로서, 더욱 상세히는 실제 대상을 가상의 공간에 구성하여 시뮬레이션을 수행하는 디지털 트윈 기술에 적용할 수 있는 가상의 인간 객체를 구현하기 위하여, 영상 분석을 기반으로 인간의 움직임을 추적하면서 그 자세에 대한 정보까지 수집하고, 이를 토대로 디지털 트윈 기술에 적용할 수 있는 가상의 인간 객체의 움직임과 자세를 생성하도록 한 디지털 트윈 기술을 위한 영상 분석 시스템에 관한 것이다.The present invention relates to an image analysis system for digital twin technology, and more particularly, image analysis to implement a virtual human object that can be applied to digital twin technology that performs simulation by configuring a real object in a virtual space. About an image analysis system for digital twin technology that tracks human movements based on will be.

디지털 트윈 기술은 물리적 사물을 컴퓨터의 가상 공간에 실제와 동일하게 가상 모델로 형성한 후 이를 시뮬레이션하여 그 결과를 예측하는 기술로서, 항공 산업에서 도입되기 시작한 이후 교통, 물류, 도시 등으로 확장되고 있다.Digital twin technology is a technology that predicts the result by forming a physical object into a virtual model identical to the real one in the virtual space of a computer. .

이와 같이 가상공간상에 실제와 동일한 가상의 시설물을 구축하고 해당 시설물을 이용하는 각종 상황을 가상 환경에서 시뮬레이션함으로써 최적의 활용을 위한 운용 방식과 개선 내용을 파악하는 등의 효과를 얻을 수 있으며, 이러한 기술은 4차 산업혁명의 핵심으로 간주되고 있는 실정이다.In this way, by constructing virtual facilities identical to the real ones in the virtual space and simulating various situations using the facilities in the virtual environment, effects such as understanding the operation method for optimal utilization and improvement contents can be obtained. is regarded as the core of the 4th industrial revolution.

특히 다양한 3D 모델링 기술과 각종 가상환경에 대한 가상현실과 물리 엔진 등의 기술이 비약적으로 발전하고 있으며, 도시 규모의 가상현실 환경까지도 구축되어 다양한 서비스에 활용되고 있는 실정이므로 이러한 디지털 트윈에 대한 기술 개선에 많은 관심이 집중되고 있다.In particular, various 3D modeling technologies and technologies such as virtual reality and physics engines for various virtual environments are developing rapidly, and even urban-scale virtual reality environments are being built and utilized for various services. Much attention is focused on

하지만, 이러한 디지털 트윈 기술을 통한 시뮬레이션이 효과적으로 진행되기 위해서는 그 대상이 되는 각종 환경에 대한 정보가 정확하게 적용되어야 하는데, 건축물이나 사회 기반 시설, 도시 등에 대한 시뮬레이션에서 반드시 필요한 것이 인간 객체들의 가상적 움직임이라 할 수 있다.However, in order for the simulation through this digital twin technology to proceed effectively, information on various target environments must be accurately applied. can

현재에는 가상의 인간 객체들에 대한 통계적 이동량이나 객체의 이동 궤적 정도의 정보만 활용되었으나, 디지털 트윈 기술의 고도화를 위해서는 좀 더 정밀한 가상 객체에 대한 정보가 요구되고 있는 실정이다. Currently, only information about the amount of statistical movement or movement trajectory of virtual human objects has been utilized, but more precise information on virtual objects is required for the advancement of digital twin technology.

한국 등록특허 제10-1989982호, [디지탈 트윈 기반의 실내 공간 분석을 통한 모델링 시스템과 모델링 방법]Korean Patent Registration No. 10-1989982, [Modeling system and modeling method through digital twin-based indoor space analysis] 한국 등록특허 제10-1325401호, [가상 객체의 이동 경로 생성 장치 및 방법]Korean Patent Registration No. 10-1325401, [Device and method for generating a movement path of a virtual object]

전술한 문제점을 해결하기 위한 본 발명의 목적은 실제 영상을 분석하여 영상 내 객체들의 이동 경로를 추적함과 아울러 해당 객체들의 자세를 관절 구분에 따른 스켈레톤 정보로 추출함으로써 실제 객체의 이동 경로와 자세를 포함하는 구체적인 정보를 수집하되, 이를 학습하도록 하여 영상 내 객체의 자세 정보 구분에 대한 성능을 높이고, 객체의 자세를 기반으로 영상 내 객체들이 군집, 소실, 폐색되는 경우에도 동일 객체를 유지할 수 있도록 함으로써, 현실 세계의 인간 객체에 대한 정보를 효과적으로 수집할 수 있도록 하는 것이다. An object of the present invention to solve the above problems is to analyze the actual image to track the movement path of the objects in the image, and to extract the posture of the object as skeleton information according to the joint classification to determine the movement path and posture of the real object. By collecting specific information including, but learning it, the performance of classifying the posture information of an object in the image is improved, and based on the posture of the object, the same object can be maintained even when objects in the image are clustered, lost, or occluded. , to effectively collect information about human objects in the real world.

나아가, 본 발명의 다른 목적은 이러한 실제 인간 객체의 이동 및 자세에 대한 정보가 수집되는 상황에 대한 분류 정보를 함께 저장한 후 디지털 트윈에 적용할 가상 인간 객체를 저장된 분류별 객체 정보를 기반으로 생성함으로써 통계적 객체의 움직임과 실제 인간 객체의 구체적 움직임에 따른 돌발적 객체의 움직임까지도 포함된 현실적인 환경을 반영할 수 있도록 하여 디지털 트윈 기술의 성능을 개선할 수 있도록 한 것이다.Furthermore, another object of the present invention is to create a virtual human object to be applied to a digital twin based on the stored object information for each classification after storing classification information about a situation in which information on the movement and posture of the real human object is collected. The performance of the digital twin technology can be improved by reflecting the realistic environment that includes the movement of statistical objects and even the sudden movement of objects according to the specific movement of real human objects.

본 발명의 실시예에 따른 디지털 트윈 기술을 위한 영상 분석 시스템은 카메라나 영상 데이터베이스를 통해 수집한 영상에서 객체를 선별하고 선별된 객체에 대한 자세를 신경망 모델을 통해 학습하며, 학습 후 얻어진 신경망 모델을 이용하여 객체에 대한 자세를 분류하여 객체의 자세를 스켈레톤 정보를 포함하는 자세 정보로 산출한 후 해당 객체의 식별자와 이동경로 및 자세 정보를 포함하는 객체 정보를 영상분석 데이터베이스에 저장하고, 해당 객체가 등장한 일자, 시간, 날씨 및 객체 등장에 영향을 줄 수 있는 것으로 미리 선별된 환경 조건 중 적어도 하나 이상의 분류 기준에 따라 상기 저장된 객체 정보 분류 정보를 상기 영상분석 데이터베이스에 저장하며, 영상 분석 데이터베이스에 저장된 객체 정보를 기반으로 분류 기준별 가상 객체를 생성하여 가상 객체 데이터베이스에 저장하는 영상 분석 기반 객체 생성 장치와, 시뮬레이션 대상에 대한 가상현실 모델 데이터를 가상현실 환경에 구성하고, 영상 분석 기반 객체 생성 장치에 시뮬레이션 조건에 따른 가상 객체들을 요청하며, 그에 따라 가변성을 가지는 분류별 가상 객체 정보를 수신하여 가상현실 환경에 적용하는 디지털 트윈 기반 시뮬레이션 장치를 포함한다.The image analysis system for digital twin technology according to an embodiment of the present invention selects an object from images collected through a camera or an image database, learns the posture of the selected object through a neural network model, and uses the neural network model obtained after learning. After classifying the posture of the object using the method to calculate the posture information including the skeleton information, the object information including the identifier of the object, the movement path and the posture information is stored in the image analysis database, and the object Storing the stored object information classification information in the image analysis database according to at least one classification criterion among the date, time, weather, and environmental conditions previously selected as being capable of affecting the appearance of the object, and the object stored in the image analysis database An image analysis-based object creation device that creates virtual objects for each classification criterion based on information and stores them in a virtual object database, configures virtual reality model data for a simulation target in a virtual reality environment, and performs simulation in an image analysis-based object creation device It includes a digital twin-based simulation device that requests virtual objects according to conditions, receives virtual object information for each classification having variability accordingly, and applies it to a virtual reality environment.

일례로서, 영상 분석 기반 객체 생성 장치는 카메라나 영상 데이터베이스를 통해 영상을 수집하는 영상 수집부와, 영상 수집부에서 수집된 영상으로부터 관심 영역을 선별하여 해당 관심 영역에 존재하는 객체를 선별하는 객체 선별부와, 객체 선별부에서 선별된 객체에 대한 자세를 신경망 모델을 통해 학습시키는 자세 학습부와, 자세 학습부를 통해 학습된 가중치가 적용된 신경망 모델을 통해서 객체 선별부에서 선별된 객체의 자세를 분류하는 자세 분류부와, 객체 선별부에서 선별된 복수의 객체들에 각각 식별자를 부여하여 지속적으로 복수 객체들을 동시에 추적하며, 자세 분류부에서 분류된 각 객체별 자세를 추적 객체들에 연동시키며 군집, 소실 또는 폐색된 객체의 재등장 시 분류된 자세를 토대로 이전 객체 식별자를 유지시키거나 신규 생성하며 추적된 객체들에 대한 객체별 식별자와 이동경로 및 자세 정보를 포함하는 객체 정보를 영상분석 데이터베이스에 저장하는 객체 식별 및 추적부를 포함할 수 있다. As an example, the apparatus for generating an object based on image analysis includes an image collection unit that collects images through a camera or an image database, and an object selection unit that selects a region of interest from the images collected by the image collection unit and selects objects existing in the region of interest A posture learning unit that learns the posture of the object selected by the object selector through a neural network model, and a neural network model to which the weights learned through the posture learning unit are applied to classify the posture of the object selected by the object selector By assigning an identifier to each of the plurality of objects selected by the posture classification unit and the object selection unit, the plurality of objects are continuously tracked at the same time, and the posture of each object classified in the posture classification unit is linked to the tracking objects to cluster and disappear. Alternatively, when an occluded object reappears, the previous object identifier is maintained or newly created based on the classified posture, and object information including the object identifier and movement path and posture information for the tracked objects is stored in the image analysis database. It may include an object identification and tracking unit.

한편, 객체 식별 및 추적부는 추적 객체가 등장한 일자, 시간, 날씨 및 객체 등장에 영향을 줄 수 있는 것으로 미리 선별된 환경 조건 중 적어도 하나 이상의 분류 기준에 따라 상기 저장된 객체 정보에 분류 정보를 더 포함시켜 영상분석 데이터베이스에 저장할 수 있다.On the other hand, the object identification and tracking unit further includes classification information in the stored object information according to at least one classification criterion among the date, time, weather, and environmental conditions previously selected as being capable of affecting the appearance of the object to be tracked. It can be stored in the image analysis database.

더불어, 영상 분석 데이터베이스에 저장된 객체 정보를 기반으로 분류 기준별 가상 객체를 생성하되 해당 분류 기준별 가상 객체의 경로와 자세의 분류 범위 내에서 자세와 경로에 대한 실제 객체들의 분포 비율에 따라 유사한 경로와 자세를 가지는 가상 객체를 생성한 후 가상 객체 데이터베이스에 저장하는 가상 객체 생성부를 포함할 수 있다.In addition, a virtual object for each classification criterion is created based on the object information stored in the image analysis database, but a path and a similar path and and a virtual object generator that generates a virtual object having a posture and stores the virtual object in a virtual object database.

나아가, 가상 객체 생성부는 디지털 트윈 기반 시뮬레이션 장치로부터 시뮬레이션 조건에 따른 가상 객체들이 요구되면, 저장된 가상 객체들 중 시뮬레이션 조건에 따른 분류별 가상 객체들을 분류별 분포에 맞추어 선별하여 제공할 수 있다.Furthermore, when virtual objects according to simulation conditions are requested from the digital twin-based simulation device, the virtual object generator may select and provide virtual objects for each classification according to the simulation condition from among the stored virtual objects according to the distribution by classification.

본 발명의 실시예에 따른 디지털 트윈 기술을 위한 영상 분석 시스템은 실제 영상을 분석하여 영상 내 객체들의 이동 경로와 스켈레톤 정보 수준의 자세 정보를 함께 저장한 후 이러한 저장 객체 정보의 상황별 분포를 기준으로 이동 경로와 자세 정보를 가지는 가상 객체들을 생성할 수 있도록 함으로써 디지털 트윈 기반 시뮬레이션에 현실에 근접한 가상적 인간 객체들의 움직임을 제공할 수 있어 디지털 트윈 기반 시뮬레이션의 품질을 높일 수 있는 효과가 있다.The image analysis system for digital twin technology according to an embodiment of the present invention analyzes an actual image, stores movement paths of objects in the image and posture information of the skeleton information level together, and then based on the situational distribution of the stored object information By making it possible to create virtual objects with movement path and posture information, the motion of virtual human objects close to reality can be provided to the digital twin-based simulation, thereby increasing the quality of the digital twin-based simulation.

특히, 실제 인간 객체의 경로와 움직임, 자세 정보를 세부 분류별로 구분한 후 이를 기반으로 가상 객체에 대한 정보를 생성하므로 통계적 의미를 가진 인간 객체들의 움직임은 물론이고 돌출행동이나 일반적으로 예상하지 못한 인간 객체들의 특이 행동에 대한 정보까지도 반영할 수 있어 디지털 트윈 기반 시뮬레이션의 완성도를 극히 높일 수 있는 효과가 있다.In particular, after classifying the path, movement, and posture information of real human objects into sub-categories, information on virtual objects is generated based on this information. Even information on the specific behavior of objects can be reflected, which has the effect of greatly improving the completeness of the digital twin-based simulation.

도 1은 본 발명의 실시예에 따른 디지털 트윈 기술을 위한 영상 분석 시스템의 구성을 보인 개념도이다.
도 2는 본 발명의 실시예에 따른 영상 분석 기반 객체 생성 장치의 구성을 보인 구성도이다.
도 3은 본 발명의 실시예에 따른 이미지 분석 신경망에 대한 개념도이다.
도 4는 본 발명의 실시예에 따른 영상 내 관심 객체 선별을 위한 신경망의 개념도이다.
도 5는 본 발명의 실시예에 따른 영상 내 객체의 자세 판정을 위한 신경망의 개념도이다.
도 6은 본 발명의 실시예에 따른 영상 내 객체의 자세 판정 결과를 예시한 예시도이다.
도 7은 본 발명의 실시예에 따른 동작 과정을 보인 순서도이다.1 is a conceptual diagram showing the configuration of an image analysis system for digital twin technology according to an embodiment of the present invention.
2 is a block diagram showing the configuration of an apparatus for generating an object based on image analysis according to an embodiment of the present invention.
3 is a conceptual diagram of an image analysis neural network according to an embodiment of the present invention.
4 is a conceptual diagram of a neural network for selecting an object of interest in an image according to an embodiment of the present invention.
5 is a conceptual diagram of a neural network for determining the posture of an object in an image according to an embodiment of the present invention.
6 is an exemplary diagram illustrating a posture determination result of an object in an image according to an embodiment of the present invention.
7 is a flowchart illustrating an operation process according to an embodiment of the present invention.

본 발명에서 사용되는 기술적 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아님을 유의해야 한다. 또한, 본 발명에서 사용되는 기술적 용어는 본 발명에서 특별히 다른 의미로 정의되지 않는 한 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 의미로 해석되어야 하며, 과도하게 포괄적인 의미로 해석되거나 과도하게 축소된 의미로 해석되지 않아야 한다. 또한, 본 발명에서 사용되는 기술적인 용어가 본 발명의 사상을 정확하게 표현하지 못하는 잘못된 기술적 용어일 때에는 당업자가 올바르게 이해할 수 있는 기술적 용어로 대체되어 이해되어야 할 것이다. 또한, 본 발명에서 사용되는 일반적인 용어는 사전에 정의되어 있는 바에 따라 또는 전후 문맥상에 따라 해석되어야 하며, 과도하게 축소된 의미로 해석되지 않아야 한다.It should be noted that the technical terms used in the present invention are only used to describe specific embodiments, and are not intended to limit the present invention. In addition, the technical terms used in the present invention should be interpreted as meanings generally understood by those of ordinary skill in the art to which the present invention belongs, unless otherwise specifically defined in the present invention. It should not be construed as meaning or in an excessively reduced meaning. In addition, when the technical term used in the present invention is an incorrect technical term that does not accurately express the spirit of the present invention, it should be understood by being replaced with a technical term that can be correctly understood by those skilled in the art. In addition, the general terms used in the present invention should be interpreted as defined in the dictionary or according to the context before and after, and should not be interpreted in an excessively reduced meaning.

또한, 본 발명에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함한다. 본 발명에서 "구성된다" 또는 "포함한다" 등의 용어는 발명에 기재된 여러 구성 요소들 또는 여러 단계를 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.Also, the singular expression used in the present invention includes the plural expression unless the context clearly dictates otherwise. In the present invention, terms such as "consisting of" or "comprising" should not be construed as necessarily including all of the various components or various steps described in the invention, and some components or some steps may not be included. It should be construed that it may further include additional components or steps.

또한, 본 발명에서 사용되는 제 1, 제 2 등과 같이 서수를 포함하는 용어는 구성 요소들을 설명하는데 사용될 수 있지만 구성 요소들은 용어들에 의해 한정되어서는 안 된다. 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제 1 구성 요소는 제 2 구성 요소로 명명될 수 있고, 유사하게 제 2 구성 요소도 제 1 구성 요소로 명명될 수 있다.Also, terms including ordinal numbers such as first, second, etc. used in the present invention may be used to describe the components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, a preferred embodiment according to the present invention will be described in detail with reference to the accompanying drawings, but the same or similar components are given the same reference numerals regardless of the reference numerals, and the redundant description thereof will be omitted.

또한, 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 발명의 사상을 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 발명의 사상이 제한되는 것으로 해석되어서는 아니 됨을 유의해야 한다.In addition, in the description of the present invention, if it is determined that a detailed description of a related known technology may obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, it should be noted that the accompanying drawings are only for easy understanding of the spirit of the present invention, and should not be construed as limiting the spirit of the present invention by the accompanying drawings.

본 발명의 실시예의 영상 분석 기반 객체 생성 장치 및 디지털 트윈 기반 시뮬레이션 장치는 연산이 가능한 다양한 단말 장치나 네트워크 기능을 구비한 서버의 형태일 수 있다. 이러한 장치들이 서버로 구성될 경우 해당 서버는 일반적으로 인터넷과 같은 개방형 컴퓨터 네트워크를 통하여 불특정 다수 클라이언트 및/또는 다른 서버와 연결되어 있고, 클라이언트 또는 다른 서버의 작업수행 요청을 접수하고 그에 대한 작업 결과를 도출하여 제공하는 컴퓨터 시스템 및 그를 위하여 설치되어 있는 컴퓨터 소프트웨어(웹서버 프로그램)를 뜻하는 것이다. 그러나, 전술한 서버 프로그램 이외에도, 서버상에서 동작하는 일련의 응용 프로그램(Application Program)과 경우에 따라서는 내부에 구축되어 있는 각종 데이터베이스를 포함하는 넓은 개념으로 이해되어야 할 것이다. 이러한 서버는 일반적인 서버용 하드웨어에 도스(DOS), 윈도우(Windows), 리눅스(Linux), 유닉스(UNIX), 매킨토시(Macintosh) 등의 운영체제에 따라 다양하게 제공되고 있는 서버 프로그램을 이용하여 구현될 수 있으며, 클라우드 방식으로 동작하는 하나 이상의 서버로 구현될 수 있다. The image analysis-based object generating apparatus and digital twin-based simulation apparatus according to an embodiment of the present invention may be in the form of various terminal devices capable of calculation or a server having network functions. When these devices are configured as a server, the server is generally connected to an unspecified number of clients and/or other servers through an open computer network such as the Internet, receives a request to perform a task from a client or other server, and transmits the work result. It refers to a computer system derived and provided and computer software (web server program) installed therefor. However, in addition to the above-described server program, it should be understood as a broad concept including a series of application programs operating on the server and, in some cases, various databases built therein. These servers can be implemented using server programs that are provided in various ways depending on operating systems such as DOS, Windows, Linux, UNIX, and Macintosh on general server hardware. , it may be implemented with one or more servers operating in a cloud manner.

나아가, 기술의 발전에 따라 이러한 영상 분석 기반 객체 생성 장치나 디지털 트윈 기반 시뮬레이션 장치는 서버가 아닌 단말로 구성될 수 있는데, 스마트 폰, 타블렛 PC, 노트북, 울트라북, 미니 PC, 마이크로 콘트롤러 기반 단말 장치, 가상 단말 장치, 디스플레이 일체형 컴퓨터, POS 단말 기반 장치, PDA(Personal Digital Assistants), MID(Mobile Internet Device), PMP(Portable Multimedia Player)등을 기본으로 하여 구성되는 디지털 기기로서 사용자의 입력을 받아들이고 사용자에게 정보를 표시하며, 연산과 저장 기능 및 통신 기능을 구비한 다양한 단말일 수 있다. Furthermore, with the development of technology, such an image analysis-based object creation device or digital twin-based simulation device may be configured as a terminal rather than a server. , virtual terminal device, display-integrated computer, POS terminal-based device, PDA (Personal Digital Assistants), MID (Mobile Internet Device), PMP (Portable Multimedia Player), etc. It may display information to the user, and may be various terminals having calculation and storage functions and communication functions.

이하, 도면을 참고하여 본 발명의 상세 실시예를 설명한다.Hereinafter, detailed embodiments of the present invention will be described with reference to the drawings.

도 1은 본 발명의 실시예에 따른 디지털 트윈 기술을 위한 영상 분석 시스템의 구성을 보인 개념도이다.1 is a conceptual diagram showing the configuration of an image analysis system for digital twin technology according to an embodiment of the present invention.

도시된 바와 같이, 디지털 트윈 기술을 위한 영상 분석 시스템(10)은 영상 분석 기반 객체 생성 장치(100)와 디지털 트윈 기반 시뮬레이션 장치(200)로 구성된다. As shown, the image analysis system 10 for digital twin technology includes an image analysis-based object generating device 100 and a digital twin-based simulation device 200 .

영상 분석 기반 객체 생성 장치(100)는 기본적으로 객체와 각 객체에 대한 자세 정보가 확보된 영상(즉 태깅된 학습 데이터)을 통해서 객체 선별, 객체의 자세 분류에 대한 학습을 수행할 수 있는데, 다양한 신경망 모델이 활용될 수 있다.The image analysis-based object generating apparatus 100 may basically perform object selection and learning of object posture classification through an image (ie, tagged learning data) in which an object and posture information about each object are secured. A neural network model may be utilized.

한편, 이와 같은 학습을 통해서 신경망 모델이 확립되면 해당 신경망 모델을 분류기로 활용하여 영상에서 객체를 구분하고, 해당 객체에 대한 자세 정보를 분류할 수 있는데, 자세 정보는 관절에 대한 정보를 포함하는 스켈레톤 정보와 해당 자세에 대한 분류 결과일 수 있다.On the other hand, when a neural network model is established through such learning, an object can be classified in an image by using the neural network model as a classifier, and posture information about the object can be classified. The posture information is a skeleton including information about joints. It may be a classification result for information and a corresponding posture.

이와 같은 영상 분석 기반 객체 생성 장치(100)는 영상으로부터 수집한 영상에서 객체를 선별하고 선별된 객체에 대한 자세를 신경망 모델을 통해 학습하며, 학습 후 얻어진 신경망 모델을 이용하여 객체에 대한 자세를 분류하여 객체의 자세를 스켈레톤 정보를 포함하는 자세 정보로 산출한 다음 해당 객체의 식별자와 이동경로 및 자세 정보를 포함하는 객체 정보를 영상분석 데이터베이스(101)에 저장한다.Such an image analysis-based object generating apparatus 100 selects an object from an image collected from an image, learns a posture for the selected object through a neural network model, and classifies a posture for an object using the neural network model obtained after learning Thus, the posture of the object is calculated as posture information including the skeleton information, and then the object information including the identifier of the object, the movement path, and the posture information is stored in the image analysis database 101 .

한편, 이렇게 영상 분석 데이터베이스(101)에 저장되는 객체에 대한 정보는 향후 디지털 트윈 기반 시뮬레이션 장치(200)에 제공할 가상적인 인간 객체 생성을 위한 기본 정보가 된다.Meanwhile, the information about the object stored in the image analysis database 101 in this way becomes basic information for generating a virtual human object to be provided to the digital twin-based simulation device 200 in the future.

따라서, 수집된 객체 정보를 좀 더 각 상황에 맞추어 분류해 둘 필요가 있으므로, 영상 분석 기반 객체 생성 장치(100)는 추적 객체가 등장한 일자, 시간, 날씨 및 객체 등장에 영향을 줄 수 있는 것으로 미리 선별된 환경 조건(국경일, 세일 기간, 방학, 개학, 기념일, 행사 등) 중 적어도 하나 이상의 분류 기준에 따라 상기 저장된 객체 정보에 분류 정보를 더 포함시켜 상기 영상분석 데이터베이스(101)에 객체 정보로서 저장한다. Therefore, since it is necessary to classify the collected object information according to each situation, the image analysis-based object generating apparatus 100 determines in advance that the date, time, weather, and object appearance of the tracking object may be affected. Classification information is further included in the stored object information according to at least one classification criterion among selected environmental conditions (national holiday, sale period, vacation, school opening, anniversary, event, etc.) and stored as object information in the image analysis database 101 do.

나아가, 영상 분석 기반 객체 생성 장치(100)는 영상 분석 데이터베이스(101)에 저장된 객체 정보를 기반으로 분류 기준별 가상 객체를 생성하여 가상 객체 데이터베이스(102)에 저장하고, 디지털 트윈 기반 시뮬레이션 장치(200)의 요청에 따라 적절한 가장 객체들을 선별하여 제공해 줄 수 있다. 이렇게 제공되는 가상 객체는 디지털 트윈 기반 시뮬레이션 장치(200)에서 객체로 활용할 수 있는 시뮬레이션 객체관련 포맷 정보로 전달될 수 있다.Furthermore, the image analysis-based object generating apparatus 100 generates a virtual object for each classification criterion based on the object information stored in the image analysis database 101 and stores it in the virtual object database 102 , and the digital twin-based simulation apparatus 200 ) can select and provide appropriate impersonation objects upon request. The provided virtual object may be transmitted as simulation object-related format information that can be used as an object in the digital twin-based simulation apparatus 200 .

디지털 트윈 기반 시뮬레이션 장치(200)는 시뮬레이션 대상에 대한 가상현실 모델 데이터를 구비한 시뮬레이션 대상 데이터베이스(201)와 연동하여 해당 가상현실 모델 데이터를 가상현실 환경에 구성하고, 영상 분석 기반 객체 생성 장치(100)에 시뮬레이션 조건(객체 분류 기준으로서 일자, 시간, 날씨 및 객체 등장에 영향을 줄 수 있는 것으로 미리 선별된 환경 조건)에 따른 가상 객체들을 요청하고, 그에 따라 가변성을 가지는 분류별 가상 객체 정보를 수신하여 가상현실 환경에 적용한다. 여기서, 가변성을 가지는 분류별 가상 객체 정보는, 영상 분석 기반 객체 생성 장치(100)로부터 전달받는 가상 객체들이 완전히 동일한 자세와 경로를 가지는 객체들이 아닌 실제 객체와 유사하게 각각 약간의 가변성을 가지는 객체라는 의미로서, 영상 분석 기반 객체 생성 장치(100)는 통계적 분석에 따른 분포에 맞추어 가상 객체들을 생성하거나 실제 객체 정보를 유사하게 모사하여 생성하되, 분포에 맞추어 평균적인 객체 정보는 물론이고 일반적이지 않은 객체 정보까지도 적절히 생성하여 제공한다. 예컨대 공항에 대한 가상 객체의 경우 알려져 있는 동선에 따라 이동하는 객체보다는 알려진 동선을 기반으로 대기 장소, 다양한 주변 상점, 안내 데스크, 화장실 등을 배회하는 객체들이 더 많으므로 이를 반영하거나, 특이하게 특정 장소를 배회하거나 직원과 같은 객체의 특이 움직임 등도 반영할 수 있게 된다.The digital twin-based simulation device 200 configures the virtual reality model data in the virtual reality environment in conjunction with the simulation target database 201 having virtual reality model data for the simulation target, and the image analysis-based object generating device 100 ) in accordance with the simulation conditions (the date, time, weather, and environmental conditions previously selected as those that can affect the appearance of objects as object classification criteria), and receive virtual object information for each classification with variability accordingly. applied to the virtual reality environment. Here, the virtual object information for each classification having variability means that the virtual objects received from the image analysis-based object generating apparatus 100 are objects having slight variability similar to real objects, rather than objects having the exact same posture and path. As such, the image analysis-based object generating apparatus 100 generates virtual objects according to a distribution according to statistical analysis or similarly simulates real object information and generates not only average object information but also unusual object information according to the distribution. It is also created and provided appropriately. For example, in the case of a virtual object for an airport, there are more objects roaming the waiting area, various nearby shops, information desks, toilets, etc. based on a known movement rather than an object that moves along a known movement. It will also be possible to reflect the unusual movement of an object, such as an employee, or wandering around.

이러한 구성을 도 2를 통해서 좀 더 상세히 살펴본다.This configuration will be described in more detail with reference to FIG. 2 .

도 2는 본 발명의 실시예에 따른 영상 분석 기반 객체 생성 장치(100)의 구성을 보인 구성도이다. 2 is a block diagram showing the configuration of an image analysis-based object generating apparatus 100 according to an embodiment of the present invention.

상기 영상 분석 기반 객체 생성 장치(100)는 카메라나 영상 데이터베이스와 같은 영상 소스(105)를 통해 영상을 수집하는 영상 수집부(110)와, 영상 수집부(110)에서 수집된 영상으로부터 관심 영역을 선별하여 해당 관심 영역에 존재하는 객체를 선별하는 객체 선별부(120)와, 객체 선별부(120)에서 선별된 객체에 대한 자세를 신경망 모델을 통해 학습시키는 자세 학습부(130)와, 자세 학습부(130)를 통해 학습된 가중치가 적용된 신경망 모델을 통해서 객체 선별부(120)에서 선별된 객체의 자세를 분류하는 자세 분류부(140)와, 객체 선별부(120)에서 선별된 복수의 객체들에 각각 식별자를 부여하여 지속적으로 복수 객체들을 동시에 추적하며, 자세 분류부(140)에서 분류된 각 객체별 자세를 추적 객체들에 연동시키며 군집, 소실 또는 폐색된 객체의 재등장 시 분류된 자세를 토대로 이전 객체 식별자를 유지시키거나 신규 생성하며 추적된 객체들에 대한 객체별 식별자와 이동경로 및 자세 정보를 포함하는 객체 정보를 영상분석 데이터베이스에 저장하는 객체 식별 및 추적부(150)를 포함한다.The image analysis-based object generating apparatus 100 includes an image collecting unit 110 that collects images through an image source 105 such as a camera or an image database, and a region of interest from the images collected by the image collecting unit 110 . An object selector 120 that selects and selects an object existing in a corresponding region of interest, a posture learner 130 that learns a posture of the object selected by the object selector 120 through a neural network model, and posture learning The posture classification unit 140 classifies the posture of the object selected by the object selector 120 through the neural network model to which the weight learned through the unit 130 is applied, and a plurality of objects selected by the object selector 120 . A plurality of objects are continuously tracked at the same time by assigning an identifier to each, and the posture of each object classified by the posture classification unit 140 is linked to the tracking objects, and the classified posture when a cluster, disappearance or occluded object reappears Maintains or creates new object identifiers based on the object identification and tracking unit 150 that stores object information including object identifiers for tracked objects and movement paths and posture information in the image analysis database. .

여기서, 객체 선별부(120)와 자세 학습부(130)는 다양한 신경망 모델이 적용될 수 있으며, 객체 선별부(120)는 이미 알려져 가중치가 확립된 분류 모델을 이용할 수 있고, 필요한 경우 이에 대해서도 별도의 학습 과정을 거칠 수 있다. 자세 학습부(130)는 영상 소스(105)를 통한 실시간 혹은 비실시간 영상 정보(태깅된 학습 정보일 수 있음)를 이용하여 자세를 학습하며, 학습이 완료된 경우 자세 분류부(140)에서 학습된 분류 모델을 적용할 수 있고, 이 경우 영상 소스(105)를 통해서 실시간 혹은 비실시간 영상을 분석하며 그 결과를 객체 식별 및 추적부(150)에 제공한다. 이러한 구체적인 신경망 모델에 대한 내용은 이후 도 3 내지 도 5를 통해서 설명한다. Here, various neural network models may be applied to the object selection unit 120 and the posture learning unit 130 , and the object selection unit 120 may use a known and weighted classification model, and if necessary, a separate neural network model may be used. You can go through the learning process. The posture learning unit 130 learns the posture using real-time or non-real-time image information (which may be tagged learning information) through the image source 105 , and when the learning is completed, the posture learned by the posture classification unit 140 is A classification model may be applied. In this case, a real-time or non-real-time image is analyzed through the image source 105 and the result is provided to the object identification and tracking unit 150 . Details of such a specific neural network model will be described later with reference to FIGS. 3 to 5 .

한편, 도시된 영상 분석 기반 객체 생성 장치(100)의 객체 식별 및 추적부(150)는 추적 객체가 등장한 일자, 시간, 날씨 및 객체 등장에 영향을 줄 수 있는 것으로 미리 선별된 환경 조건 중 적어도 하나 이상의 분류 기준에 따라 상기 영상분석 데이터베이스(101)에 저장된 객체 정보에 분류 정보를 더 추가하여 저장한다.On the other hand, the object identification and tracking unit 150 of the image analysis-based object generating apparatus 100 shown is at least one of the date, time, weather, and environmental conditions pre-selected as being capable of affecting the appearance of the object to be tracked. According to the above classification criteria, classification information is further added to the object information stored in the image analysis database 101 and stored.

또한, 영상 분석 기반 객체 생성 장치(100)는 영상 분석 데이터베이스(101)에 저장된 객체 정보를 기반으로 분류 기준별 가상 객체를 생성하되 해당 분류 기준별 가상 객체의 경로와 자세의 분류 범위 내에서 자세와 경로에 대한 실제 객체들의 분포 비율에 따라 유사한 경로와 자세를 가지는 가상 객체를 생성한 후 가상 객체 데이터베이스(102)에 저장하는 가상 객체 생성부(160)를 포함한다. In addition, the image analysis-based object generating apparatus 100 generates a virtual object for each classification criterion based on the object information stored in the image analysis database 101, but within the classification range of the path and posture of the virtual object for each classification criterion, and a virtual object generator 160 that generates a virtual object having a similar path and posture according to a distribution ratio of real objects with respect to the path and stores the virtual object in the virtual object database 102 .

이러한 상기 가상 객체 생성부(160)는 디지털 트윈 기반 시뮬레이션 장치(200)로부터 시뮬레이션 조건에 따른 가상 객체들이 요구되면, 저장된 가상 객체들 중 시뮬레이션 조건에 따른 분류별 가상 객체들을 분류별 분포에 맞추어 선별하여 제공한다.When virtual objects according to simulation conditions are requested from the digital twin-based simulation device 200, the virtual object generator 160 selects and provides virtual objects for each classification according to the simulation condition from among the stored virtual objects according to the distribution by classification. .

도 3 내지 도 5는 본 발명의 실시예에 따른 객체 선별부(120), 자세 학습부(130) 및 자세 분류부(140)에 적용되는 신경망에 대한 것이다.3 to 5 are neural networks applied to the object selection unit 120 , the posture learning unit 130 , and the posture classifying unit 140 according to an embodiment of the present invention.

도 3은 본 발명의 실시예에 따른 이미지 분석 신경망에 대한 개념도이다.3 is a conceptual diagram of an image analysis neural network according to an embodiment of the present invention.

도시된 구성은 신경망 모델 중 이미지 분석에 특화된 합성곱 신경망((Convolutional Neural Networks: CNN))에 대한 것으로, 알려져 있는 VGG16 CNN의 구성을 보인 것이다.The illustrated configuration is for a convolutional neural network (CNN) specialized in image analysis among neural network models, and shows the configuration of a known VGG16 CNN.

도시된 바와 같이 CNN의 각 계층의 산출결과를 영상으로 나타낸 것을 특징맵(feature map)이라고 한다. 입력 계층의 경우 특징맵은 입력 영상인 RGB 세 채널의 행렬이 된다. 은닉 계층의 특징맵은 합성곱 및 활성화 함수 등으로 산출된 영상들로서, 각 영상 내의 픽셀들은 특징맵의 특징으로 간주될 수 있다. 각 특징맵은 다음과 같은 과정을 거쳐 산출된다. 특징맵 내의 신경세포(뉴런: neuron) 또는 픽셀들은 주변 인접한 이웃 신경세포들의 일부분들(수용장: receptive field)과, 오차역전파법으로 얻어진 가중치 행렬과의 합성곱(convolution) 연산을 거친 후, 최대/평균/L2 풀링 등 다양한 변환 및 비선형 함수(예 : 시그모이드, ReLU) 연산 과정 후에 새로운 특징맵이 생성하게 된다. 참고로 풀링 과정은 수용장의 응답을 하나로 요약함으로써 보다 강인한 특징(특징맵 내의 픽셀)을 생성할 수 있게 된다.As shown in the figure, an image representing the calculation result of each layer of CNN is called a feature map. In the case of the input layer, the feature map becomes a matrix of three RGB channels, which are input images. The feature map of the hidden layer is images calculated by convolution and activation functions, and the pixels in each image may be regarded as features of the feature map. Each feature map is calculated through the following process. Neurons (neurons) or pixels in the feature map undergo a convolution operation with parts of neighboring neurons (receptive field) and a weight matrix obtained by the error backpropagation method. A new feature map is created after various transformations such as /average/L2 pooling and other nonlinear functions (eg, sigmoid, ReLU) calculation processes. For reference, the pooling process can generate more robust features (pixels in the feature map) by summarizing the responses of the receptive field.

연속된 합성곱과 풀링 과정을 거치면, 초기 특징 계층 구조가 구성되며, 이는 여러 지도학습 방식에 의한 완전연결계층들(Fully Connected Layers: FCN)을 통해 미세조정되어 여러 컴퓨터비전 목적에 따라 결과를 도출한게 된다. 따라서 최종 계층은 여러 다른 활성화 함수를 통해 각 출력 뉴런에 조건부 확률값 결과를 산출하게 된다. 전체적인 신경망은 평균자승오차(Mean Squared Error: MSE), 교차 엔트로피(cross-entropy) 등 목적함수를 최적화는 과정 및 오차역전파법에서의 확률적 경사하강법(Stochastic Gradient Descent: SGD)을 통하여 학습된다. 예를 들어 위 그림 2의 VGG16의 경우 총 13개의 합성곱 계층과, 3개의 완전연결계층, 그리고 최종적으로 소프트맥스(softmax) 분류 계층으로 이루어진다. 구체적으로 합성곱 특징맵은 3x3 합성곱 필터를 통하여, 그리고 특징맵의 해상도는 2-폭(stride) 최대풀링 계층을 통해 낮아져 산출된다. 물론 입력영상의 크기가 다를 경우 크기변환 또는 영상 자르기 등의 전처리 과정이 필요하다.Through the continuous convolution and pooling process, the initial feature hierarchy is constructed, which is fine-tuned through Fully Connected Layers (FCN) by various supervised learning methods to derive results according to various computer vision purposes. will do Therefore, the final layer produces conditional probability values for each output neuron through different activation functions. The overall neural network is trained through the process of optimizing objective functions such as mean squared error (MSE) and cross-entropy, and stochastic gradient descent (SGD) in the error backpropagation method. . For example, in the case of VGG16 in Figure 2 above, it consists of a total of 13 convolutional layers, 3 fully connected layers, and finally a softmax classification layer. Specifically, the convolutional feature map is calculated through a 3x3 convolution filter, and the resolution of the feature map is lowered through a 2-stride max pooling layer. Of course, if the size of the input image is different, preprocessing such as size conversion or image cropping is required.

이러한 CNN은 다른 종류의 인공 지능 기법들에 대비하여 영상 분석에 효율적이므로 객체 식별이나 장면 컨텍스트 분석에 주로 활용되고 있다.This CNN is effective for image analysis compared to other types of artificial intelligence techniques, so it is mainly used for object identification or scene context analysis.

본 발명의 객체 선별부(120)는 이러한 CNN을 기반으로 하되, 도 4와 같이 관심영역에 대한 객체 검출 알고리즘을 활용한다.The object selection unit 120 of the present invention is based on such a CNN, but utilizes an object detection algorithm for a region of interest as shown in FIG. 4 .

도 4는 본 발명의 실시예에 따른 영상 내 관심 객체 선별을 위한 신경망의 개념도이다.4 is a conceptual diagram of a neural network for selecting an object of interest in an image according to an embodiment of the present invention.

일반적인 객체 검출에서는 영상 내에 존재하는 객체들의 위치를 찾고, 분류하여, 그 분류 신뢰도(확률)와 함께 사각형의 바운딩 박스로 레이블링하는 것을 목표로 한다. 이러한 객체 검출 프레임워크는 (1) 객체 영역을 제안한 후 각 제안 영역에 대하여 객체 카테고리를 분류하는 전통적인 방법과, (2) 객체 검출을 회귀분석 및 분류 문제로 간주하여, 카테고리 분류 및 위치 등 최종 결과를 하나의 통일된 프레임워크 관점에서 도출하는 방법으로 구분할 수 있다. In general object detection, the goal is to find the location of objects existing in an image, classify them, and label them with a rectangular bounding box together with the classification reliability (probability). This object detection framework considers (1) the traditional method of classifying object categories for each proposed area after proposing an object area, and (2) considering object detection as a regression analysis and classification problem, and final results such as category classification and location can be classified by a method of deriving them from a single unified framework point of view.

영역 제안 기반의 대표적인 방법들로는 R-CNN, SPP-net, 고속 R-CNN, 초고속 R-CNN , R-FCN, FPN 및 마스크 R-CNN 등이 있다. 회귀분석/분류 기반 방법으로는 MultiBox, AttentionNet, G-CNN, YOLO, SSD, YOLOv2, DSSD 및 DSOD 등이 있다. 본 발명의 객체 선별부(120)는 딥러닝 기반 객체 검출의 큰 도약을 이루게 한 영역 제안 기반 접근방법 중 하나인 R-CNN (Regions with CNN features) 방법을 이용한다.Representative methods based on region proposals include R-CNN, SPP-net, high-speed R-CNN, high-speed R-CNN, R-FCN, FPN, and mask R-CNN. Regression/classification-based methods include MultiBox, AttentionNet, G-CNN, YOLO, SSD, YOLOv2, DSSD, and DSOD. The object selection unit 120 of the present invention uses the R-CNN (Regions with CNN features) method, which is one of the region proposal-based approaches that made a big leap in deep learning-based object detection.

바운딩 박스 후보들의 품질을 향상시키고 높은 수준의 특징을 추출하기 위해 딥러닝의 구조를 더 심층화하는 것이 필요하다. 이에 도달하기 위한 R-CNN은 2014년 Ross Girshick에 의해 제안되었으며, PASCAL VOC 2012에서 이전 최고 결과 (DPM HSC)보다 30% 이상 개선된 53.3 %의 평균 정밀도(mAP)를 얻었다. 도시된 도 4는 R-CNN의 흐름도를 보여준다. It is necessary to further deepen the structure of deep learning to improve the quality of bounding box candidates and extract high-level features. R-CNN to reach this was proposed by Ross Girshick in 2014, and obtained an average precision (mAP) of 53.3%, which is more than 30% improvement over the previous best result (DPM HSC) in PASCAL VOC 2012. 4 shows a flowchart of R-CNN.

먼저, R-CNN은 각 이미지에 대해 약 2천 개의 영역 제안을 생성하기 위한 선택적 검색을 채택한다. 선택적 검색 방법은 보다 정확한 후보를 제공하기 위해 간단한 상향식 그룹화 및 돌극성(saliency) 신호에 의존한다. 선택적 검색을 통하여 임의 크기의 박스들이 빠르게 검출되고 물체 검출에서 탐색 공간이 감소된다.First, R-CNN adopts a selective search to generate about 2,000 region proposals for each image. Selective search methods rely on simple bottom-up grouping and saliency signals to provide more accurate candidates. Through selective search, boxes of any size are quickly detected and the search space is reduced in object detection.

이후 CNN 기반 특징 추출을 진행한다. 이 단계에서 각 제안 영역은 와핑 또는 잘림을 통하여 고정 크기로 전환되고, CNN 모듈을 통하여 최종 표현으로서 4096 차원의 특징이 추출된다. CNN의 고수준 학습능력, 표현력, 계층적 구조 때문에 각 제안 영역에 대하여 추상적, 의미론적, 강인한 특징 기반 표현이 가능해진다. After that, we proceed with CNN-based feature extraction. In this step, each proposed region is converted to a fixed size through warping or truncation, and 4096-dimensional features are extracted as a final representation through the CNN module. Because of CNN's high-level learning ability, expressive power, and hierarchical structure, abstract, semantic, and robust feature-based representations for each proposed area are possible.

그 다음, 분류 및 위치 발견이 진행된다. 여러 클래스에 대해 사전훈련된 특정 카테고리용 선형 SVM(Support Vector Machine)을 사용하여 여러 제안 영역들(+: 객체, -: 배경)에 대한 점수가 결정된다. 이 점수화된 영역들은 바운딩 박스 회귀분석으로 조정되어 탐욕적 NMS(greedy Non-Maximum Suppression) 적용 후 최종 바운딩 박스를 산출한다.Then, classification and localization proceed. Scores are determined for several proposed domains (+: object, -: background) using a linear support vector machine (SVM) for a specific category, pre-trained for several classes. These scored regions are adjusted by bounding box regression analysis to calculate the final bounding box after greedy non-maximum suppression (NMS) is applied.

이를 통해서 효과적으로 영상 내에서 원하는 객체들을 선별할 수 있다.Through this, it is possible to effectively select desired objects in the image.

도 5는 본 발명의 실시예에 따른 영상 내 객체의 자세 판정을 위한 신경망의 개념도이다.5 is a conceptual diagram of a neural network for determining the posture of an object in an image according to an embodiment of the present invention.

본 발명의 실시예에서는 영상 내 여러 인간 객체들의 자세(pose)를 추적하고, 영상 프레임에 걸쳐 각 키포인트에 대하여 유일한 인스턴스 아이디를 배정하는 자세 추적 프레임워크를 이용한다. 기존 대부분의 자세 추적 연구방법들은 오프라인 방법으로서 실시간 영상처리보다는 객체 추적의 정확성에 보다 초점이 맞추어져 있다. 따라서 기존 방법들은 인간 객체 검출, 후보 자세 추정, 식별성 연관 과정을 순차적으로 진행한다. 자세 추정 결과에 따라 자세 추적 결과는 최적화 문제를 풂으로써 계산된다. 이 경우 차후 프레임 내의 자세들을 사전 계산해야한다. An embodiment of the present invention uses a posture tracking framework that tracks the poses of several human objects in an image and assigns a unique instance ID to each keypoint across an image frame. Most of the existing posture tracking research methods are offline methods and focus on the accuracy of object tracking rather than real-time image processing. Therefore, existing methods sequentially perform human object detection, candidate posture estimation, and identification association processes. According to the posture estimation result, the posture tracking result is calculated by solving the optimization problem. In this case, the postures in the next frame must be pre-calculated.

하지만, 본 발명의 경우 라이트트랙(LightTrack) 방법을 적용하는데, 이는 일반적이고 하향적인 접근 방식으로서, 자세 추정의 후보들이 검출된 후 자세 추정이 이루어지며, 실시간으로 영상 처리가 가능하다는 장점이 있다. However, in the case of the present invention, the LightTrack method is applied, which is a general and top-down approach, and has the advantage that the posture estimation is performed after candidates for posture estimation are detected, and image processing is possible in real time.

본 발명은 영상으로부터 인간 객체를 효과적으로 선별하고 이를 추적하며, 그 자세를 파악하는 것으로서, 인간의 자세는 인간의 위치를 더 잘 추론하기 위해 사용될 수 있다는 점을 활용한다. The present invention effectively selects a human object from an image, tracks it, and recognizes the posture, utilizing the fact that the human posture can be used to better infer the human position.

따라서 자세 추정 후보 검출 후 자세 추정을 하는 하향식 접근 방식에서는, 정확한 인간 객체 위치를 통해 인간 객체의 자세를 쉽게 추정할 수 있다. 따라서 (1) 대략적인 인간 객체 위치는 단일 인간 객체의 자세 추정에 접목될 수 있으며, (2) 인간 관절의 위치는 인간 객체의 위치 식별에 대략적으로 사용될 수 있으며, (3) 따라서 반복적으로 추정과정으로써 단일 인간 객체 자세 추적(Single-person Pose Tracking: SPT)이 가능해진다.Therefore, in a top-down approach for posture estimation after detecting a posture estimation candidate, the posture of a human object can be easily estimated through an accurate human object position. Therefore, (1) the approximate human object position can be grafted to the postural estimation of a single human object, (2) the human joint position can be roughly used to identify the position of the human object, and (3) iteratively, the estimation process This enables single-person pose tracking (SPT).

여러 인간 객체 자세 추적(Multi-target Pose Tracking: MPT)는 단순히 여러 인간 객체들에 대하여 SPT를 반복하는 방법보다는, 여러 객체들을 동시에 추적하고, 추가 재식별(Re-ID)을 통해 그 객체들을 식별하는 것이 유리하다. Multi-target Pose Tracking (MPT) tracks multiple objects simultaneously and identifies the objects through additional re-identification (Re-ID), rather than simply repeating SPT for several human objects. it is advantageous to do

따라서, 본 발명의 실시예에 따른 객체 식별 및 추적부(150)는 복수 객체들을 동시에 추적하는 방식을 이용한다.Accordingly, the object identification and tracking unit 150 according to an embodiment of the present invention uses a method of simultaneously tracking a plurality of objects.

객체 추적의 경우 다음과 같은 상황들에 주의하여야 한다: (1) 인간 객체 중 카메라 뷰에서 사라지거나 폐색되는 경우, (2) 새 후보들이 카메라 뷰에 등장하거나, 기존 사라졌던 후보들이 다시 뷰에 등장하는 경우, (3) 서로를 향해 인간 객체들이 움직여서 마치 한 객체처럼 보이는 경우, (4) 카메라의 빠른 이동 또는 주밍 때문에 추적이 실패하는 경우 등이다.In the case of object tracking, attention should be paid to the following situations: (1) when a human object disappears or is occluded from the camera view, (2) new candidates appear in the camera view, or when candidates that previously disappeared reappear in the view case, (3) when human objects move toward each other and look like one object, (4) when tracking fails due to fast movement or zooming of the camera.

따라서, 이러한 상황에서 객체를 소실한 경우 새로운 객체를 다시 추적한다면 인간 객체에 대한 연속적인 이동 경로 확보가 어렵게 되므로, 본 발명의 실시예에서는 이러한 상황들에 대해서 인간의 자세를 기반으로 추적 객체의 일관성을 가급적 유지하도록 한다.Therefore, if the object is lost in such a situation, if a new object is traced again, it is difficult to secure a continuous movement path for the human object. to be maintained as much as possible.

즉, 폐색 또는 카메라 이동으로 인하여 객체 후보들을 놓치는 경우에만 객체 후보들을 다시 도입하기 위한 검출 모듈을 실행하고, 자세 매칭 방법을 통하여 이전 프레임의 추적된 타겟들에 대하여 객체 후보들을 연관시킨다.That is, the detection module for re-introducing object candidates is executed only when object candidates are missed due to occlusion or camera movement, and the object candidates are associated with the tracked targets of the previous frame through the posture matching method.

그리고 식별 연관 과정에서는, 공간 정보의 일관성과 자세의 일관성 정보를 활용한다. 재식별 과정에서 새롭게 관측된 객체와 추적된 객체 후보 간의 매칭이 요구되는데, 이는 시각 특징 분류기를 이용하게 된다. 본 발명의 실시예에서는 그래프 컨볼루션 네트워크(Graph Convolution Network: GCN)를 이용하여 인간 객체 관절의 그래프 표현을 가능하게 하였다. 이러한 인간 객체 스켈레톤의 그래프 표현성은 차후 자세 매칭을 위한 후보 매칭 과정에 사용된다.And in the identification and association process, spatial information consistency and posture consistency information are used. In the re-identification process, matching between a newly observed object and a tracked object candidate is required, which uses a visual feature classifier. In an embodiment of the present invention, graph representation of human object joints is made possible by using a graph convolution network (GCN). The graph expressiveness of this human object skeleton is used in the candidate matching process for posture matching.

이러한 GCN을 이용하는 자세 분류 방식은 자세 분류부(140)에 적용될 수 있다.Such a posture classification method using the GCN may be applied to the posture classification unit 140 .

도 4에 도시된 구성은 샤미즈(Siamese) GCN에 대한 네트워크 구조를 나타낸 것이다.The configuration shown in FIG. 4 shows the network structure for the Siamese GCN.

인간 객체 관절의 2D 좌표가 주어졌을 때, 관절을 노드로, 인간 몸 구조의 연결성을 에지로 하는 공간 그래프를 생성할 수 있다. 본 발명의 실시예에서는 GCN의 입력으로 그래프 노드의 관절 좌표 벡터를 사용하며, 다중 그래프 합성곱 연산을 통하여 인간 자세에 대한 개념적 요약으로서의 특징 표현을 생성한다. Given the 2D coordinates of the joints of a human object, a spatial graph can be created with the joints as nodes and the connectivity of human body structures as edges. In an embodiment of the present invention, a joint coordinate vector of a graph node is used as an input of the GCN, and a feature expression as a conceptual summary of a human posture is generated through a multi-graph convolution operation.

네트워크 출력 특징들 간의 거리를 통하여 얼마나 두 자세가 유사한지, 즉 두 자세가 서로 매칭이 되는지 결정하는데, 도시된 구성은 2개의 GCN과 대조 손실을 이용한 1개의 합성곱 계층으로 구성된, 자세 매칭을 위한 샤미즈 GCN 구조로서 두 특징 벡터를 동일한 가중치 벡터를 가진 네트워크의 입력 그래프 쌍으로부터 추출하므로 특징 벡터는 인간 객체 관절의 공간적 관계성 정보를 함축하게 된다.Through the distance between the network output features, it is determined how similar two postures are, that is, whether the two postures match each other. As the Chamise GCN structure, two feature vectors are extracted from a pair of input graphs of the network with the same weight vectors, so the feature vectors imply the spatial relationship information of human object joints.

이러한 스켈레톤 기반 표현은 인간 객체 자세의 유사성을 효과적으로 표현할 뿐만 아니라, 계산량에서 효율적이며, 동시에 카메라 이동에 강인하다는 특징을 갖는다. This skeleton-based representation not only effectively expresses the similarity of the human object posture, but also has the characteristics of being efficient in the amount of computation and robust to camera movement.

도 6은 본 발명의 실시예에 따른 영상 내 객체의 자세 판정 결과를 예시한 예시도이다. 6 is an exemplary diagram illustrating a posture determination result of an object in an image according to an embodiment of the present invention.

도시된 바와 같이 영상 내 관심 객체인 인간을 선별하고 각 인간 객체에 대한 식별자를 할당하여 추적하면서 동시에 해당 객체의 스켈레톤 기반 자세 정보를 산출하고, 이를 기반으로 객체의 일관성을 지속적으로 유지하도록 한다.As shown, a human being an object of interest in the image is selected and an identifier for each human object is assigned and tracked while simultaneously calculating the skeleton-based posture information of the corresponding object, and based on this, the object's consistency is continuously maintained.

도 7은 본 발명의 실시예에 따른 동작 과정을 보인 순서도이다.7 is a flowchart illustrating an operation process according to an embodiment of the present invention.

도시된 바와 같이 본 발명은 R-CNN을 기반으로 객체를 선별하고, 선별 객체 위치에 대한 자세를 추정하고, 이를 학습하는 과정을 수행한다.As shown, the present invention selects an object based on R-CNN, estimates a posture for the position of the selected object, and performs a learning process.

이후, 학습이 완료되면, 영상소스로부터 R-CNN을 통해 객체를 선별하고 영상 내 객체들을 동시에 추적하면서 추적 객체의 자세를 분류한다. 각 객체의 군집, 소실, 폐색 등이 발생하고 신규 객체 후보들이 발견되면 자세를 기반으로 추적 객체의 일관성을 유지하면서 관리한다.After that, when the learning is completed, the object is selected from the image source through R-CNN and the posture of the tracking object is classified while simultaneously tracking the objects in the image. When clustering, disappearance, or occlusion of each object occurs and new object candidates are found, the tracking object is managed while maintaining consistency based on the posture.

이러한 객체별 추적 경로와 자세 정보를 추적 객체가 등장한 일자, 시간, 날씨 및 객체 등장에 영향을 줄 수 있는 것으로 미리 선별된 환경 조건 등의 분류 정보와 함께 저장한다.The tracking path and posture information for each object are stored together with classification information such as the date, time, weather, and environmental conditions previously selected as those that may affect the appearance of the tracking object.

이후, 이렇게 수집 누적된 객체 정보를 분류하여 분류별 가상 객체를 생성한다. 이러한 가상 객체는 해당 분류 기준별 가상 객체의 경로와 자세의 분류 범위 내에서 자세와 경로에 대한 실제 객체들의 분포 비율에 따라 유사한 경로와 자세를 가지는 가상 객체들을 포함한다.Thereafter, the collected and accumulated object information is classified to generate virtual objects for each classification. These virtual objects include virtual objects having similar paths and postures according to a distribution ratio of postures and real objects to postures within a classification range of virtual object paths and postures for each classification criterion.

도시되지는 않았으나, 이후 디지털 트윈 기반 시뮬레이션 장치로부터 시뮬레이션 조건에 따른 가상 객체들이 요구되면, 저장된 가상 객체들 중 시뮬레이션 조건에 따른 분류별 가상 객체들을 분류별 분포에 맞추어 선별하여 제공할 수 있게 된다.Although not shown, when virtual objects according to the simulation conditions are requested from the digital twin-based simulation device, virtual objects for each classification according to the simulation conditions among the stored virtual objects can be selected and provided according to the distribution for each classification.

이와 같은 본 발명의 실시예를 통해서 디지털 트윈 기반 시뮬레이션에 현실에 근접한 가상적 인간 객체들의 움직임을 제공할 수 있어 디지털 트윈 기반 시뮬레이션의 품질을 높일 수 있게 된다.Through this embodiment of the present invention, it is possible to provide the motion of virtual human objects close to reality to the digital twin-based simulation, thereby increasing the quality of the digital twin-based simulation.

본 명세서에 기술된 다양한 장치 및 구성부는 하드웨어 회로(예를 들어, CMOS 기반 로직 회로), 펌웨어, 소프트웨어 또는 이들의 조합에 의해 구현될 수 있다. 예를 들어, 다양한 전기적 구조의 형태로 트랜지스터, 로직게이트 및 전자회로를 활용하여 구현될 수 있다.The various devices and components described herein may be implemented by hardware circuitry (eg, CMOS-based logic circuitry), firmware, software, or a combination thereof. For example, it may be implemented using transistors, logic gates, and electronic circuits in the form of various electrical structures.

전술된 내용은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Those of ordinary skill in the art to which the present invention pertains may modify and modify the above-described contents without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The protection scope of the present invention should be construed by the following claims, and all technical ideas within the equivalent range should be construed as being included in the scope of the present invention.

100: 영상 분석 기반 객체 생성 장치 110: 영상 수집부
120: 객체 선별부 130: 자세 학습부
140: 자세 분류부 150: 객체 식별 및 추적부
160: 가상 객체 생성부
200: 디지털 트윈 기반 시뮬레이션 장치100: image analysis-based object generating device 110: image collecting unit
120: object selection unit 130: posture learning unit
140: posture classification unit 150: object identification and tracking unit
160: virtual object generator
200: digital twin-based simulation device

Claims

Objects are selected from images collected through a camera or image database, and the posture of the selected object is learned through a neural network model. Calculating with posture information including: tracking while assigning an identifier to the selected object to obtain an identifier and a movement path of the corresponding object, and object information including the identifier and movement path and posture information for the selected object The data is stored in the image analysis database, but classification information is added to the stored object information according to at least one classification criterion among the date, time, weather, and environmental conditions previously selected as being capable of affecting the appearance of the selected object. In a manner of including and storing, object information including the classification information for each of a plurality of objects selected from images collected through the camera or image database is stored in the image analysis database, and based on the object information stored in the image analysis database an image analysis-based object generating device for generating virtual objects according to classification criteria and storing them in a virtual object database;
A digital application that configures virtual reality model data for a simulation target in a virtual reality environment, requests virtual objects according to simulation conditions from the image analysis-based object generating device, and receives virtual object information by classification accordingly and applies it to the virtual reality environment Includes a twin-based simulation unit;
The image analysis-based object generating apparatus generates a virtual object for each classification criterion based on the object information accumulated and stored in the image analysis database, but the virtual object path for each classification criterion and the actual position for the path within a classification range of the virtual object are generated. After creating virtual objects with similar paths and postures according to the distribution ratio of the objects, they are stored in the virtual object database, and the date, time, weather, and objects that are selected in advance as those that can affect the appearance of objects are selected from the digital twin-based simulation device. Image for digital twin technology, characterized in that when virtual objects according to a simulation condition including at least one of environmental conditions are requested, virtual objects by classification according to the simulation condition from among the stored virtual objects are selected and provided according to distribution by classification analysis system.

The method according to claim 1, The image analysis-based object generating device
an image collecting unit for collecting images through a camera or an image database;
an object selection unit that selects a region of interest from the images collected by the image collection unit and selects objects existing in the region of interest;
a posture learning unit for learning the posture of the object selected by the object selecting unit through a neural network model;
a posture classification unit for classifying the posture of the object selected by the object selector through the neural network model to which the weight learned through the posture learning unit is applied;
By assigning an identifier to each of the plurality of objects selected by the object sorting unit, the plurality of objects are continuously tracked simultaneously, the posture of each object classified in the posture classification unit is linked to the tracking objects, and When re-appearing, an object identification and tracking unit that maintains or creates new object identifiers based on the classified postures and stores object information including object identifiers for tracked objects and movement paths and posture information in the image analysis database Image analysis system for digital twin technology, characterized in that it comprises.

The method according to claim 2, wherein the object identification and tracking unit classification information in the stored object information according to at least one classification criterion among the date, time, weather, and environmental conditions pre-selected as being capable of affecting the appearance of the tracking object An image analysis system for digital twin technology, characterized in that it is further included and stored in the image analysis database.

The method according to claim 3, wherein the virtual object for each classification criterion is created based on the object information stored in the image analysis database, and the path of the virtual object for each classification criterion and the distribution ratio of the real objects to the posture and the path within the classification range of the classification criterion. An image analysis system for digital twin technology, comprising a virtual object generating unit that generates a virtual object having a similar path and posture according to the user and stores the virtual object in a virtual object database.

The method according to claim 4, wherein when virtual objects according to the simulation conditions are requested from the digital twin-based simulation device, the virtual object creation unit selects and provides virtual objects according to the classification according to the simulation conditions from among the stored virtual objects according to the distribution by classification. An image analysis system for digital twin technology.