KR20210109927A

KR20210109927A - System, method and program of constructing dataset for training visual characteristic recognition model

Info

Publication number: KR20210109927A
Application number: KR1020200025172A
Authority: KR
Inventors: 이종혁
Original assignee: 주식회사 서르
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2021-09-07

Abstract

Provided are a method, apparatus, and program for constructing a dataset for learning a visual characteristic recognition model. In accordance with an embodiment of the present invention, the method for constructing the dataset for learning the visual characteristic recognition model comprises: a learning image data acquisition step (S100) in which a server acquires a plurality of pieces of learning image data; a learning image data transmission step (S200) in which the server transmits the plurality of pieces of learning image data to an expert client; an individual characteristic reception step (S300) in which the server receives an individual characteristic on visual properties of each piece of the learning image data from the expert client; and a step (S400) in which the server constructs a learning dataset based on the plurality of individual characteristics received by the server for each piece of the learning image data. The present invention aims to provide the method, apparatus, and program for constructing a dataset for learning a visual characteristic recognition model, which is capable of more precisely learning the visual characteristic recognition model.

Description

Dataset construction method, device and program for learning visual characteristics recognition model

본 발명은 시각특성 인식모델 학습용 데이터셋 구축 방법, 장치 및 프로그램에 관한 것이다.The present invention relates to a method, apparatus and program for constructing a dataset for learning a visual characteristic recognition model.

최근 인터넷의 발달로 소셜 미디어 네트워크 서비스가 급격히 성장하고 있다. 그 결과, 멀티미디어의 양이 폭발 적으로 증가하면서 효과적인 이미지 검색 시스템이 요구되며, 이미지 어노테이션은 폭발적으로 증가하는 웹이미지에 따른 효율적인 이미지 검색의 필요성으로 인해 그 중요도가 점점 높아지고 있다.Recently, with the development of the Internet, social media network services are rapidly growing. As a result, as the amount of multimedia explosively increases, an effective image search system is required, and the importance of image annotation is increasing due to the need for efficient image search according to the explosively increasing web image.

대부분의 이미지 검색 연구는 주로 이미지의 내용을 분석하는 내용 기반 이미지 검색(CBIR: Content-based Image Retrieval) 방법이 많이 진행되어 왔다. 내용 기반 이미지 검색은 색상, 텍스처 및 형태와 같은 시각적 특징을 이용하여 이미지의 내용을 분석한다. 이러한 방법은 정의하는 태그의 개수가 적을 경우에는 잘 작동하지만, 데이터셋이 커지고 태그의 종류가 다양해짐에 따라 성능이 떨어지게 된다.Most of the image retrieval studies have mainly conducted a content-based image retrieval (CBIR) method that analyzes the contents of an image. Content-based image retrieval analyzes the content of an image using visual features such as color, texture, and shape. This method works well when the number of tags to be defined is small, but the performance deteriorates as the dataset grows and the types of tags diversify.

텍스트 기반 이미지 검색(TBIR: Text-based Image Retrieval)은 텍스트를 쿼리로 하여 텍스트에 대응되는 이미지를 검색하는 방식이다. 이 방식은 이미지의 시각적 내용이 수동으로 태깅된 텍스트 디스크립터에 의해 표현되며, 데이터셋 관리 시스템에서 이미지 검색을 수행하는데 사용된다. 즉, 기존의 이미지 또는 영상 검색 방식은 사용자가 직접 태깅한 정보를 기반으로 검색이 이루어진다. 이에 따라, 사용자가 영상에 키워드를 잘못 태깅하면 획득 결과가 부정확해지는 문제점이 존재하였다. 또한, 사용자마다 정의하는 키워드에 차이가 존재할 수 있어서, 이미지를 입력하는 사용자가 선택한 키워드에 따라 제공되는 결과가 상이한 문제가 존재하였다.Text-based image retrieval (TBIR) is a method of searching for images corresponding to text by using text as a query. In this method, the visual content of an image is expressed by a manually tagged text descriptor, which is used to perform image search in a dataset management system. That is, in the existing image or video search method, a search is performed based on information directly tagged by a user. Accordingly, there is a problem in that the acquisition result is inaccurate if the user tags the keyword incorrectly on the image. Also, since there may be differences in keywords defined for each user, there is a problem in that results provided according to keywords selected by a user inputting an image are different.

공개특허공보 제10-2018-0133200호, 2018.12.13Laid-open Patent Publication No. 10-2018-0133200, 2018.12.13

본 발명은, 입력된 이미지(영상데이터)에 대하여 시각적 특징을 서술하는 개별특성을 정확하게 출력 가능한 시각특성 인식모델을 학습하기 위한 학습용 데이터셋 구축 방법, 장치 및 프로그램을 제공하고자 한다.An object of the present invention is to provide a method, apparatus and program for constructing a learning dataset for learning a visual characteristic recognition model capable of accurately outputting individual characteristics describing visual characteristics with respect to an input image (image data).

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

본 발명의 일 실시예에 따른 시각특성 인식모델 학습용 데이터셋 구축 방법은, 서버가 하나 이상의 대상체를 포함하는 복수의 학습용 영상데이터를 획득하는, 학습용 영상데이터 획득 단계; 상기 서버가 각각의 학습용 영상데이터의 영상유형에 따라 복수의 전문가 클라이언트에 학습용 영상데이터를 전송하되, 상기 전문가 클라이언트는 시각속성조합 내의 각각의 시각속성을 담당하여 분류하는 전문가의 클라이언트인, 학습용 영상데이터 전송 단계; 상기 서버가 상기 전문가 클라이언트로부터 각각의 학습용 영상데이터의 시각속성에 대한 개별특성을 수신하는, 개별특성 수신 단계; 및 상기 서버가 각각의 학습용 영상데이터에 대해 수신한 복수의 개별특성을 기초로 각각의 학습용 영상데이터에 대한 외형서술데이터를 생성하여, 학습용 데이터셋을 구축하는 단계를 포함하고, 상기 시각속성은, 상기 대상체의 시각적 특징을 서술하기 위한 특정한 분류기준으로서, 동일한 분류기준 내의 다양한 시각적 특징을 표현하는 복수의 개별특성을 포함하는 것이고, 상기 시각속성조합은, 특정한 영상유형에 대하여 산출되어야 하는 복수의 시각속성이 미리 설정된 것이고, 상기 학습용 데이터셋은, 하나 이상의 시각특성 인식모델의 학습에 이용되는 것이고, 상기 시각특성 인식모델은, 상기 학습용 데이터셋에 포함된 복수의 학습용 영상데이터와 이에 대응되는 상기 외형서술데이터에서 추출된 하나 이상의 개별특성을 이용하여 학습되는 것이다.A method for constructing a dataset for learning a visual characteristic recognition model according to an embodiment of the present invention includes: acquiring, by a server, a plurality of image data for training including one or more objects; acquiring image data for training; The server transmits the training image data to a plurality of expert clients according to the image type of each training image data, wherein the expert client is a client of an expert who is in charge of classifying each visual attribute in the visual attribute combination, learning image data transmission step; an individual characteristic receiving step of receiving, by the server, individual characteristics for the visual attributes of each training image data from the expert client; and generating, by the server, external descriptive data for each training image data based on a plurality of individual characteristics received for each training image data, and constructing a training dataset, wherein the visual properties include, As a specific classification criterion for describing the visual characteristics of the object, it includes a plurality of individual characteristics expressing various visual characteristics within the same classification criterion, and the visual attribute combination is a plurality of viewpoints to be calculated for a specific image type. The property is preset, the training dataset is used for learning one or more visual characteristic recognition models, and the visual characteristic recognition model includes a plurality of training image data included in the training dataset and the appearance corresponding thereto. It is learned using one or more individual characteristics extracted from descriptive data.

또한, 다른 일 실시예로, 상기 시각특성 인식모델은, 상이한 시각속성을 판단하는 복수의 개별속성 인식모듈을 포함하고, 상기 복수의 개별속성 인식모듈은, 각각의 개별속성 인식모듈에 대응되는 개별 학습용 데이터셋을 기초로 학습되는 것이고, 상기 개별 학습용 데이터셋은, 복수의 학습용 영상데이터에 외형서술데이터에서 추출된 특정한 시각속성에 대한 개별특성이 매칭된 것이다.Further, in another embodiment, the visual characteristic recognition model includes a plurality of individual attribute recognition modules for determining different visual attributes, and the plurality of individual attribute recognition modules include individual individual attribute recognition modules corresponding to each individual attribute recognition module. It is learned based on a training dataset, and the individual training dataset is one in which individual characteristics for a specific visual attribute extracted from external description data are matched to a plurality of training image data.

또한, 다른 일 실시예로, 상기 시각속성조합은, 각각의 시각속성에 속하는 최하위의 시각속성인 복수의 세부 시각속성으로 조합되는 것을 특징으로 한다.In another embodiment, the combination of the visual attributes is characterized in that it is combined into a plurality of detailed visual attributes that are the lowest visual attributes belonging to each visual attribute.

또한, 다른 일 실시예로, 상기 외형서술데이터는, 상기 학습용 영상데이터에 대한 복수의 개별특성에 대응하는 코드값을 추출하여 조합한 코드열 형태의 외형서술데이터인 것이다.Further, in another embodiment, the outline description data is the outline description data in the form of a code string obtained by extracting and combining code values corresponding to a plurality of individual characteristics of the image data for training.

또한, 다른 일 실시예로, 상기 세부 개별특성에 대응하는 코드값은, 상기 세부 시각속성의 하나 이상의 상위 시각속성에 대한 개별특성 정보를 포함한다.Also, according to another embodiment, the code value corresponding to the detailed individual characteristic includes individual characteristic information on one or more upper visual attributes of the detailed visual attribute.

또한, 다른 일 실시예로, 신규 시각속성이 추가되는 경우, 상기 서버가 상기 학습용 데이터셋에 포함된 복수의 학습용 영상데이터를 추가된 신규 시각속성을 담당하여 분류하는 전문가의 클라이언트에 전송하는, 학습용 영상데이터 추가 전송단계; 상기 서버가 상기 전문가 클라이언트로부터 각각의 학습용 영상데이터의 신규 시각속성에 대한 신규 개별특성을 수신하는 단계; 및 상기 서버가 수신한 복수의 신규 개별특성에 대응하는 코드값을 추출하여 각각의 학습용 영상데이터의 외형서술데이터에 추가하는 단계;를 더 포함한다.In addition, in another embodiment, when a new visual attribute is added, the server transmits a plurality of training image data included in the training dataset to a client of an expert who is in charge of classifying the added new visual attribute. image data additional transmission step; receiving, by the server, a new individual characteristic for each new visual attribute of each training image data from the expert client; and extracting code values corresponding to the plurality of new individual characteristics received by the server and adding the extracted code values to the outline description data of each image data for training.

또한, 다른 일 실시예로, 상기 서버가 상기 신규 시각속성을 하나 이상의 영상유형에 대한 시각속성조합에 추가하는 단계를 더 포함하고, 상기 학습용 영상데이터 추가 전송단계는, 상기 신규 시각속성이 추가된 영상유형의 학습용 영상데이터만 전송하는 것을 특징으로 한다.Also, in another embodiment, the method further comprises the step of adding, by the server, the new visual attribute to a visual attribute combination for one or more image types, wherein the additional transmitting image data for training includes: It is characterized in that only image data for learning of the image type is transmitted.

또한, 다른 일 실시예로, 상기 시각속성은, 특정한 대상체의 움직임을 서술하기 위한 동적시각속성을 포함하고, 상기 영상데이터가 복수의 프레임을 포함하는 동영상데이터인 경우, 상기 학습용 데이터셋 구축 단계는, 상기 동영상 데이터 내의 각각의 프레임에 대해 개별특성을 획득하고, 복수의 프레임에 대해 동적개별특성을 획득하여 구축되는 것을 특징으로 한다.In addition, in another embodiment, the visual attribute includes a dynamic visual attribute for describing the movement of a specific object, and when the image data is moving image data including a plurality of frames, the step of constructing a dataset for learning includes: , characterized in that it is constructed by acquiring individual characteristics for each frame in the moving picture data and acquiring dynamic individual characteristics for a plurality of frames.

본 발명의 다른 일 실시예에 따른 시각특성 인식모델 학습용 데이터셋 구축 장치는, 하나 이상의 컴퓨터를 포함하고, 상기 언급된 데이터셋 구축 방법을 수행한다.The apparatus for constructing a dataset for learning a visual characteristic recognition model according to another embodiment of the present invention includes one or more computers and performs the above-mentioned method for constructing a dataset.

본 발명의 또 다른 일실시예에 따른 시각특성 인식모델 학습용 데이터셋 구축 프로그램은, 하드웨어와 결합되어 상기 언급된 데이터셋 구축 방법을 실행하며, 매체에 저장된다.A data set building program for learning a visual characteristic recognition model according to another embodiment of the present invention is combined with hardware to execute the above-mentioned data set building method, and is stored in a medium.

본 발명의 기타 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Other specific details of the invention are included in the detailed description and drawings.

상기 본 발명에 의하면, 복수의 학습용 영상데이터에 대한 특정 시각속성을 동일한 전문가가 판단하도록 함으로써 통일된 기준의 학습용 데이터셋을 구축하여 시각특성 인식모델을 보다 정확하게 학습할 수 있다.According to the present invention, by allowing the same expert to determine specific visual attributes for a plurality of learning image data, it is possible to more accurately learn a visual characteristic recognition model by constructing a learning dataset based on a unified standard.

또한, 상기 본 발명에 의하면, 특정한 영상유형에 대하여 적용되는 시각속성의 유형을 미리 설정하고, 설정된 시각속성 담당 전문가 클라이언트에만 해당 영상유형의 학습용 영상데이터를 전송하여 개별특성을 수신함으로써, 학습용 데이터셋 구축 효율을 높일 수 있는 효과가 있다.In addition, according to the present invention, by setting the type of visual attribute applied to a specific image type in advance, and transmitting the training image data of the corresponding image type only to the set visual attribute expert client to receive individual characteristics, a training dataset It has the effect of increasing the construction efficiency.

또한, 상기 본 발명에 의하면, 신규 시각속성이 추가되는 경우, 기 구축된 학습용 데이터셋에 포함된 복수의 학습용 영상데이터의 외형서술데이터에 대해 신규 개별특성에 대응하는 코드값을 추가하여 간단하게 학습용 데이터셋을 업데이트할 수 있다.In addition, according to the present invention, when a new visual attribute is added, a code value corresponding to the new individual characteristic is added to the external description data of a plurality of image data for learning included in the previously built learning dataset for simple learning. You can update the dataset.

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.Effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 본 발명의 일 실시예에 따른 학습용 데이터셋 구축 방법의 순서도이다.
도 2는 본 발명의 일 실시예에 따른 학습용 데이터셋 구축 시스템의 구성도이다.
도 3은 본 발명의 일 실시예에 따른 시각특성 인식모델의 구성도이다.
도 4는 본 발명의 일 실시예에 따른 신규 시각속성이 추가된 경우의 학습용 데이터셋 구축 방법의 순서도이다.1 is a flowchart of a method for constructing a training dataset according to an embodiment of the present invention.
2 is a block diagram of a data set construction system for learning according to an embodiment of the present invention.
3 is a block diagram of a visual characteristic recognition model according to an embodiment of the present invention.
4 is a flowchart of a method of constructing a data set for learning when a new visual attribute is added according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only these embodiments allow the disclosure of the present invention to be complete, and those of ordinary skill in the art to which the present invention pertains. It is provided to fully understand the scope of the present invention to those skilled in the art, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.The terminology used herein is for the purpose of describing the embodiments and is not intended to limit the present invention. As used herein, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, “comprises” and/or “comprising” does not exclude the presence or addition of one or more other components in addition to the stated components. Like reference numerals refer to like elements throughout, and "and/or" includes each and every combination of one or more of the recited elements. Although "first", "second", etc. are used to describe various elements, these elements are not limited by these terms, of course. These terms are only used to distinguish one component from another. Accordingly, it goes without saying that the first component mentioned below may be the second component within the spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used herein will have the meaning commonly understood by those of ordinary skill in the art to which this invention belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless specifically defined explicitly.

본 명세서에서 '컴퓨터'는 연산처리를 수행하여 사용자에게 결과를 제공할 수 있는 다양한 장치들이 모두 포함된다. 예를 들어, 컴퓨터는 데스크 탑 PC, 노트북(Note Book) 뿐만 아니라 스마트폰(Smart phone), 태블릿 PC, 셀룰러폰(Cellular phone), 피씨에스폰(PCS phone; Personal Communication Service phone), 동기식/비동기식 IMT-2000(International Mobile Telecommunication-2000)의 이동 단말기, 팜 PC(Palm Personal Computer), 개인용 디지털 보조기(PDA; Personal Digital Assistant) 등도 해당될 수 있다. 또한, 헤드마운트 디스플레이(Head Mounted Display; HMD) 장치가 컴퓨팅 기능을 포함하는 경우, HMD장치가 컴퓨터가 될 수 있다. 또한, 컴퓨터는 클라이언트로부터 요청을 수신하여 정보처리를 수행하는 서버가 해당될 수 있다.As used herein, the term 'computer' includes various devices capable of providing a result to a user by performing arithmetic processing. For example, computers include desktop PCs and notebooks (Note Books) as well as smart phones, tablet PCs, cellular phones, PCS phones (Personal Communication Service phones), synchronous/asynchronous A mobile terminal of International Mobile Telecommunication-2000 (IMT-2000), a Palm Personal Computer (PC), a Personal Digital Assistant (PDA), and the like may also be applicable. Also, when a head mounted display (HMD) device includes a computing function, the HMD device may be a computer. In addition, the computer may correspond to a server that receives a request from a client and performs information processing.

본 명세서에서 '클라이언트'는 사용자들이 프로그램(또는 어플리케이션)을 설치하여 사용할 수 있는 통신 기능을 포함한 모든 장치를 말한다. 즉, 클라이언트 장치는 스마트폰, 태블릿, PDA, 랩톱, 스마트워치, 스마트카메라 등과 같은 전기 통신 장치, 리모트 콘트롤러 중 하나 이상을 포함할 수 있으나, 이에 제한되는 것은 아니다.In this specification, a 'client' refers to any device including a communication function that users can use by installing a program (or application). That is, the client device may include one or more of a telecommunication device such as a smart phone, a tablet, a PDA, a laptop, a smart watch, a smart camera, and a remote controller, but is not limited thereto.

본 명세서에서 '영상데이터'는 하나 이상의 대상체를 포함하는 2차원 또는 3차원의 정적 또는 동적 이미지를 의미한다. 즉, '영상데이터'는 하나의 프레임인 정적 영상데이터일 수도 있고, 복수의 프레임이 연속되는 동적 영상데이터(즉, 동영상데이터)일 수도 있다.As used herein, 'image data' refers to a two-dimensional or three-dimensional static or dynamic image including one or more objects. That is, 'image data' may be static image data that is one frame, or dynamic image data (ie, moving image data) in which a plurality of frames are continuous.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 학습용 데이터셋 구축 방법의 순서도이다.1 is a flowchart of a method of constructing a training dataset according to an embodiment of the present invention.

도1을 참조하면, 본 발명의 일 실시예에 따른 시각특성 인식모델 학습용 데이터셋 구축 방법은, 서버가 복수의 학습용 영상데이터를 획득하는, 학습용 영상데이터 획득 단계(S100); 상기 서버가 전문가 클라이언트에 복수의 학습용 영상데이터를 전송하는, 학습용 영상데이터 전송 단계(S200); 상기 서버가 상기 전문가 클라이언트로부터 각각의 학습용 영상데이터의 시각속성에 대한 개별특성을 수신하는, 개별특성 수신 단계(S300); 및 상기 서버가 각각의 학습용 영상데이터에 대해 수신한 복수의 개별특성을 기초로 학습용 데이터셋을 구축하는 단계(S400)를 포함한다. 이하, 각 단계에 대한 상세한 설명을 기술한다. Referring to FIG. 1 , a method for constructing a dataset for learning a visual characteristic recognition model according to an embodiment of the present invention includes: acquiring, by a server, a plurality of image data for training, acquiring image data for training (S100); The server transmits a plurality of training image data to the expert client, training image data transmission step (S200); an individual characteristic receiving step (S300) in which the server receives individual characteristics for the visual attributes of each training image data from the expert client; and constructing, by the server, a training dataset based on a plurality of individual characteristics received for each training image data (S400). Hereinafter, a detailed description of each step will be described.

서버(10)가 복수의 학습용 영상데이터를 획득한다(S100). 상기 학습용 영상데이터는, 학습모델의 트레이닝에 이용되는 영상데이터를 의미한다. The server 10 acquires a plurality of learning image data (S100). The image data for learning means image data used for training of a learning model.

일 실시예에서, 상기 학습용 영상데이터의 획득은 서버가 영상제공자 클라이언트로부터 학습용 영상데이터를 획득하는 것을 포함한다. 상기 영상제공자 클라이언트는 특정한 영상데이터를 서버에 업로드하는 영상제공자의 클라이언트를 의미한다.In an embodiment, the acquiring of the image data for training includes the server acquiring the image data for training from an image provider client. The image provider client means a client of the image provider that uploads specific image data to the server.

일 실시예예서, 상기 학습용 영상데이터의 획득은 특정한 영상데이터가 서버에 업로드 되는 것을 포함한다. In an embodiment, the acquisition of the image data for training includes uploading specific image data to a server.

예를 들어, 쇼핑몰에 판매물품의 이미지를 업로드하는 경우, 영상제공자 클라이언트는 쇼핑몰에 이미지를 업로드하는 자의 클라이언트이고, 판매물품의 이미지가 학습용 영상데이터가 될 수 있다.For example, when an image of a product for sale is uploaded to a shopping mall, the image provider client is a client of a person who uploads the image to the shopping mall, and the image of the product for sale may be image data for learning.

일 실시예에서, 상기 학습모델은 시각특성 인식모델(100)이다. 이하, 도 2 및 도 3을 참조하여 본 발명의 일 실시예에 따른 시각특성 인식모델(100)에 대하여 설명한다.In one embodiment, the learning model is a visual characteristic recognition model 100 . Hereinafter, a visual characteristic recognition model 100 according to an embodiment of the present invention will be described with reference to FIGS. 2 and 3 .

일 실시예에서, 상기 시각특성 인식모델(100)은, 특정한 영상데이터가 입력된 경우, 상기 영상데이터의 복수의 시각속성에 대한 개별특성을 출력하는 학습모델이다.In an embodiment, the visual characteristic recognition model 100 is a learning model that outputs individual characteristics of a plurality of visual attributes of the image data when specific image data is input.

상기 시각속성은 특정한 대상체의 시각적 특징을 서술(description)하거나 주석 삽입(annotation)을 위한 특정한 분류기준으로서, 동일한 분류기준 내의 다양한 시각적 특징을 표현하는 복수의 개별특성을 포함한다.The visual attribute is a specific classification criterion for describing or annotating a visual characteristic of a specific object, and includes a plurality of individual characteristics expressing various visual characteristics within the same classification criterion.

예를 들어, 대상체가 의류인 경우, 시각속성은 의류의 시각적 특징에 대한 분류기준으로, 패턴(Pattern), 색상(Color), 핏(fit), 기장(Length) 등이 해당될 수 있으며, 대상체가 사람인 경우, 시각속성은 자세, 팔 동작, 손 모양 등이 해당될 수 있다.For example, when the object is clothing, the visual attribute is a classification criterion for visual characteristics of clothing, and may correspond to a pattern, a color, a fit, a length, and the like. If is a person, the visual attribute may correspond to a posture, an arm motion, a hand shape, and the like.

일 실시예에서, 상기 시각속성은 정지된 대상체의 외형과 같은 시각적 특징을 서술하기 위한 시각속성뿐만 아니라, '팔 동작'과 같이 움직이는 대상체의 동작을 서술하기 위한 동적시각속성을 포함한다.In an embodiment, the visual attribute includes a dynamic visual attribute for describing a motion of a moving object such as 'arm motion' as well as a visual attribute for describing a visual characteristic such as an appearance of a stationary object.

또한, 일 실시예예서, 상기 시각속성은 복수의 대상체의 시각적 특징을 동시에 서술하기 위한 특정한 분류기준을 포함할 수 있다. 예를 들어, '수술 동작' 시각속성의 경우, '수술 도구' 및 '장기(수술 대상)' 각각의 대상체 유형에 따라 개별특성이(어떤 수술 동작인지) 산출될 수 있다.Also, according to an embodiment, the visual attribute may include a specific classification criterion for simultaneously describing the visual characteristics of a plurality of objects. For example, in the case of the 'surgical operation' visual attribute, individual characteristics (which type of surgical operation) may be calculated according to the type of each object of 'surgical tool' and 'organ (surgery target)'.

또한, 일 실시예에서, 상기 시각특성 인식모델(100)은, 도 2와 같이 상이한 시각속성을 판단하는 복수의 개별속성 인식모듈(110)을 포함한다. 즉, 상기 시각특성 인식모델(100)은 각각의 시각속성을 인식하는 것으로 특화된 복수의 개별속성 인식모듈(110)을 포함한다. 시각속성의 종류가 많을수록, 서버(10)는 다수의 개별속성 인식모듈(110)을 시각특성 인식모델(100) 내에 포함한다. 상기 개별속성 인식모듈(110)은 특정한 시각속성에 포함된 개별특성을 산출하는 것이다.Also, in one embodiment, the visual characteristic recognition model 100 includes a plurality of individual attribute recognition modules 110 for determining different visual attributes as shown in FIG. 2 . That is, the visual characteristic recognition model 100 includes a plurality of individual attribute recognition modules 110 specialized for recognizing each visual attribute. As the number of visual attributes increases, the server 10 includes a plurality of individual attribute recognition modules 110 in the visual attribute recognition model 100 . The individual attribute recognition module 110 calculates individual characteristics included in a specific visual attribute.

또한, 일 실시예에서, 상기 시각특성 인식모델(100)은, 도 3과 같이 특정한 영상유형의 영상데이터에 대하여 시각적 특징을 인식하기 위한 복수의 특화 시각특성 인식모델(120)을 포함한다.In addition, in one embodiment, the visual characteristic recognition model 100 includes a plurality of specialized visual characteristic recognition models 120 for recognizing visual characteristics with respect to image data of a specific image type as shown in FIG. 3 .

일 실시예에서, 각각의 특화 시각특성 인식모델(120)은, 영상유형별로 상이한 개별속성 인식모듈(110)의 조합을 포함한다.In one embodiment, each specialized visual characteristic recognition model 120 includes a combination of different individual attribute recognition modules 110 for each image type.

구체적으로, 영상데이터의 유형에 따라 영상데이터로부터 산출되어야 하는 시각적 특징이 상이할 수 있으므로, 각각의 영상유형의 영상데이터에 적용되는 복수의 시각속성의 조합인 시각속성조합을 설정한다.Specifically, since the visual characteristics to be calculated from the image data may be different depending on the type of image data, a visual attribute combination that is a combination of a plurality of visual attributes applied to the image data of each image type is set.

그리고, 특정한 영상유형에 대한 시각속성조합 내의 각각의 시각속성을 판단하는 복수의 개별속성 인식모듈을 조합하여, 각각의 영상유형에 대한 특화 시각특성 인식모델을 생성한다.Then, by combining a plurality of individual attribute recognition modules that determine each visual attribute in the visual attribute combination for a specific image type, a specialized visual characteristic recognition model for each image type is generated.

예를 들어, 복수의 영상유형(의류 영상, 수술 영상, 운동 영상)에 대하여, 의류 영상유형에는 '색상', '패턴', '어깨 모양', '소매 길이' 시각속성이, 수술 영상유형에는 '수술 도구', '장기', '수술 도구의 움직임' 시각속성이, 운동 영상유형에는 '운동 기구', '팔, 다리의 각도', '손가락 모양' 시각속성이 조합된 시각속성조합이 설정될 수 있고, 이를 기초로 각각의 영상유형에 대한 특화 시각특성 인식모델이 생성될 수 있다.For example, for a plurality of image types (clothing image, surgery image, exercise image), 'color', 'pattern', 'shoulder shape', and 'sleeve length' visual attributes are in the clothing image type, and the surgical image type is The visual attribute combination is set for 'surgery tool', 'organ', 'movement of surgical tool', and visual attribute combination of 'exercise equipment', 'arm and leg angle', and 'finger shape' for the motion image type. and a specialized visual characteristic recognition model for each image type can be created based on this.

또한, 도면에 도시되지는 않았으나, 일 실시예에서 각각의 영상유형에 대한 특화 시각특성 인식모델은, 특정한 영상유형 내의 복수의 세부유형에 대한 특화 시각특성 인식모델을 포함할 수 있다.Also, although not shown in the drawings, in an embodiment, the specialized visual characteristic recognition model for each image type may include a specialized visual characteristic recognition model for a plurality of subtypes within a specific image type.

예를 들어, 의류 영상에 대한 상의, 하의 영상이나 운동 영상에 대한 야구, 축구 영상과 같은 영상유형 내 세부유형에 따라 산출되어야 하는 시각적 특징이 상이할 수 있으므로, 각각의 세부유형에 적용되는 개별속성 인식모듈로 구성된 세부유형 특화 시각특성 인식모델을 생성할 수 있다.For example, since visual characteristics to be calculated may be different depending on subtypes within an image type such as a top and bottom image for a clothing image or a baseball or soccer image for an exercise image, individual attributes applied to each subtype A detailed type-specific visual characteristic recognition model composed of recognition modules can be created.

일 실시예에서, 복수의 시각속성 및 각각의 시각속성에 대한 복수의 개별특성은 서버에 의하여 설정될 수 있다. 즉, 서버는 학습용 영상데이터의 시각적 특징을 판단하는 기준인 시각속성 및 각각의 시각속성에 대해 학습용 영상데이터를 레이블링할 특징(feature) 종류인 개별특성을 설정한다.In an embodiment, a plurality of visual attributes and a plurality of individual characteristics for each visual attribute may be set by the server. That is, the server sets a visual attribute, which is a criterion for judging the visual characteristics of the image data for learning, and an individual characteristic that is a type of feature to label the image data for learning for each visual attribute.

일 실시예에서, 서버는 특정한 영상유형의 영상데이터 분석에 대한 전문가 클라이언트로부터, 해당 영상유형 영상데이터의 시각적 특징을 서술하기 위한 복수의 시각속성(즉, 시각속성조합) 및 각 시각속성 내의 복수의 개별특성을 설정받을 수 있다.In one embodiment, the server, from an expert client for image data analysis of a specific image type, includes a plurality of visual attributes (ie, a combination of visual attributes) for describing the visual characteristics of image data of a corresponding image type and a plurality of visual attributes within each visual attribute. Individual characteristics can be set.

예를 들어, 의류 영상유형에 대한 특화 시각특성 인식모델(120)을 구축하는 경우, 서버는 의류 전문가인 디자이너의 클라이언트로부터 적절한 시각속성 및 이에 포함되는 개별특성을 입력받아 설정함으로써, 각 시각속성에 대한 개별속성 인식모듈(110) 및 복수의 개별속성 인식모듈을 포함하는 의류 영상유형에 대한 특화 시각특성 인식모델(120)을 구축할 수 있다.For example, when constructing the specialized visual characteristic recognition model 120 for a clothing image type, the server receives and sets appropriate visual attributes and individual characteristics included therein from a client of a designer who is a clothing expert, and sets each visual attribute. It is possible to construct a specialized visual characteristic recognition model 120 for an image type of clothing including an individual attribute recognition module 110 and a plurality of individual attribute recognition modules.

이어서, 서버가 전문가 클라이언트에 복수의 학습용 영상데이터를 전송한다(S200). Next, the server transmits a plurality of learning image data to the expert client (S200).

상기 전문가 클라이언트는, 학습용 영상데이터에 개별특성을 부여(즉, 학습용 영상데이터의 레이블링(Labeling))하는 역할을 수행하는 전문가의 클라이언트를 의미한다.The expert client refers to a client of an expert who performs a role of imparting individual characteristics to the image data for training (ie, labeling the image data for training).

일 실시예에서, 상기 전문가 클라이언트는 각각의 시각속성을 담당하여 분류하는 전문가의 클라이언트를 의미한다. 즉, 특정한 시각속성에 따른 개별특성 획득(부여)은, 동일한 전문가에 의해 수행될 수 있다.In an embodiment, the expert client refers to a client of an expert who classifies each visual attribute in charge. That is, the individual characteristic acquisition (given) according to a specific visual attribute may be performed by the same expert.

예를 들어, '색상' 시각속성에 대한 복수의 학습용 영상데이터의 개별특성 획득은, '색상' 담당 전문가에 의해 수행될 수 있다. 일부 시각속성의 판단은 다소 주관적일 수 있기 때문에, 복수의 학습용 영상데이터에 대한 특정한 시각속성을 동일한 전문가가 판단하도록 함으로써 통일된 기준의 학습용 데이터셋을 구축하여 시각특성 인식모델을 보다 정확하게 학습할 수 있는 효과가 있다.For example, acquisition of individual characteristics of a plurality of image data for learning with respect to the visual attribute of 'color' may be performed by an expert in charge of 'color'. Since the judgment of some visual attributes can be somewhat subjective, by having the same expert judge specific visual attributes for a plurality of learning image data, it is possible to more accurately learn the visual characteristic recognition model by building a learning dataset with a unified standard. there is an effect

일 실시예에서, 상기 학습용 영상데이터 전송 단계(S200)는, 서버가 획득한 모든 학습용 영상데이터를 모든 전문가 클라이언트에 전송하는 것일 수 있다. 즉, 영상유형에 무관하게 획득한 모든 학습용 영상데이터를 특정한 시각속성을 담당하는 전문가 클라이언트에 전송하는 것이다. In one embodiment, the step of transmitting the training image data ( S200 ) may be to transmit all the training image data acquired by the server to all the expert clients. That is, all the acquired image data for learning regardless of the image type is transmitted to the expert client in charge of a specific visual attribute.

이 경우, 특정 영상유형에 적용되지 않는 시각속성 담당 전문가 클라이언트는 이에 대한 정보를 서버에 전송할 수 있다. In this case, the expert client in charge of visual attributes that is not applied to a specific image type may transmit information about this to the server.

예를 들어, 의류 영상유형에 대한 학습용 영상데이터가 '수술 도구' 시각속성 담당 전문가 클라이언트에 전송된 경우, “해당 시각속성을 적용할 수 없음(개별특성을 판단할 수 없음)”에 대한 정보를 상기 서버에 전송할 수 있다.For example, if the training image data for the clothing image type is transmitted to the 'surgical tool' visual attribute expert client, information on “the corresponding visual attribute cannot be applied (individual characteristics cannot be determined)” can be transmitted to the server.

다른 일 실시예에서, 상기 학습용 영상데이터 전송 단계(S200)는, 학습용 영상데이터의 영상유형에 대하여 적용하도록 설정된 각각의 시각속성(즉, 시각속성조합에 포함된 각각의 시각속성) 담당 전문가 클라이언트에 상기 학습용 영상데이터를 전송하는 것일 수 있다.In another embodiment, the step of transmitting the image data for training ( S200 ) is to the expert client in charge of each visual attribute (ie, each visual attribute included in the visual attribute combination) set to be applied to the image type of the image data for training. The training image data may be transmitted.

구체적으로, 전술한 예시를 참조하면, 의류 영상은 '색상', '패턴', '어깨 모양', '소매 길이' 각각의 담당 전문가 클라이언트에 전송하고, 수술 영상은 '수술 도구', '장기', '수술 도구의 움직임' 각각의 담당 전문가 클라이언트에 전송할 수 있다.Specifically, referring to the above example, the clothing image is transmitted to the expert clients in charge of each of 'color', 'pattern', 'shoulder shape', and 'sleeve length', and the surgical image is 'surgery tool' and 'organ' , the 'movement of the surgical tool' can be transmitted to the respective specialist in charge of the client.

또한, 일 실시예에서, 상기 시각속성은, 특정 유형의 대상체에만 적용되는 특화 시각속성 및 모든 유형의 대상체에 적용되는 범용 시각속성을 포함한다. 이하, 의류 영상 내의 의류 대상체를 예로 설명한다.Also, in an embodiment, the visual attribute includes a special visual attribute applied only to a specific type of object and a general visual attribute applied to all types of objects. Hereinafter, the clothing object in the clothing image will be described as an example.

예를 들어, '색상' 또는 '질감'과 같은 시각속성은 모든 의류 대상체에 대하여 적용(개별특성 산출)할 수 있으므로 범용 시각속성이다. 이와 달리, '목모양' 시각속성은 '의류 중 상의' 유형의 대상체에 대하여만 적용할 수 있으므로 '상의'에 대한 특화 시각속성이다. For example, a visual attribute such as 'color' or 'texture' can be applied (individual characteristic calculation) to all clothing objects, and thus is a general-purpose visual attribute. In contrast, the 'neck shape' visual attribute is a specialized visual attribute for 'top' because it can be applied only to the 'top of clothing' type object.

일 실시예에서, 도면에 도시되지는 않았으나, 서버가 획득한 학습용 영상데이터의 대상체 유형정보를 획득하는 단계;를 더 포함하고, 상기 학습용 영상데이터 전송 단계는, 서버가 범용 시각속성을 담당하여 분류하는 전문가의 클라이언트에는 획득한 모든 학습용 영상데이터를 전송하되, 특화 시각속성을 담당하여 분류하는 전문가의 클라이언트에는 해당 특화 시각속성이 적용되는 유형의 대상체가 등장하는 학습용 영상데이터만 전송하는 것을 포함한다. 이하, 자세히 설명한다.In one embodiment, although not shown in the drawing, the method further includes: obtaining, by the server, object type information of the image data for training acquired, wherein the step of transmitting the image data for training includes the server in charge of general-purpose visual attributes and classifying This includes transmitting all the acquired image data for learning to the client of the expert, but transmitting only the training image data in which the type of object to which the specialized visual attribute is applied appears to the client of the expert who is in charge of classifying the specialized visual attribute. Hereinafter, it will be described in detail.

서버가 학습용 영상데이터의 대상체 유형정보를 획득한다. 즉, 학습용 영상데이터에 등장하는 하나 이상의 대상체의 유형정보를 획득한다. The server acquires object type information of the image data for learning. That is, type information of one or more objects appearing in the image data for learning is acquired.

일 실시예에서, 상기 대상체 유형정보는 특정 대상체의 상위 유형정보 또는 하위 유형정보를 포함한다. 예를 들어, '셔츠'에 대한 유형정보의 경우, '의류-상의-셔츠' 또는 '셔츠'일 수 있다. 또한, 이 경우, 하위 유형정보인 '셔츠'는 상위 유형정보인 '의류-상의'에 대한 정보를 포함할 수 있다.In an embodiment, the object type information includes upper type information or lower type information of a specific object. For example, in the case of type information on 'shirt', it may be 'clothing-top-shirt' or 'shirt'. Also, in this case, 'shirt', which is the lower type information, may include information about 'clothes-top', which is the upper type information.

일 실시예에서, 특정 시각속성이 '대상체 유형' 또는 '의류 유형'과 같이 대상체의 유형에 관한 것인 경우, 즉, 대상체의 유형을 구분하는 유형분류기준이 시각속성의 하위 개념에 속하는 분류기준인 경우 대상체 유형정보의 획득은 해당 시각속성(유형분류기준)에 대한 개별특성의 획득일 수 있다.In one embodiment, when the specific visual attribute relates to the type of the object, such as 'object type' or 'clothing type', that is, the classification criterion for classifying the type of the object belongs to a sub-concept of the visual attribute. In the case of , acquisition of object type information may be acquisition of individual characteristics for a corresponding visual attribute (type classification criterion).

다른 실시예에서, 상기 대상체 유형정보의 획득은, 서버가 유형분류 담당 전문가 클라이언트로부터 획득하거나 별도의 유형인식모델을 통해 획득하는 것을 포함한다. 단, 대상체 유형정보 획득 방법은 전술한 실시예에 제한되지 않는다.In another embodiment, the obtaining of the object type information includes obtaining, by the server, from a type classification expert client or through a separate type recognition model. However, the method of acquiring object type information is not limited to the above-described embodiment.

즉, 대상체의 유형을 구분하는 유형분류기준은, 시각속성의 하위 개념에 속하는 분류 기준이거나, 시각속성과 독립한 별도의 분류 기준일 수 있다.That is, the type classification criterion for classifying the type of the object may be a classification criterion belonging to a sub-concept of the visual attribute or a separate classification criterion independent of the visual attribute.

일 실시예에서, 상기 학습용 영상데이터 전송단계는, 서버가 범용 시각속성을 담당하여 분류하는 전문가의 클라이언트에는 획득한 모든 학습용 영상데이터를 전송하되, 특화 시각속성을 담당하여 분류하는 전문가의 클라이언트에는 해당 특화 시각속성이 적용되는 유형의 대상체가 등장하는 학습용 영상데이터만 전송한다.In one embodiment, in the step of transmitting the image data for training, the server transmits all the acquired image data for training to the client of the expert who classifies in charge of the general visual attribute, but corresponds to the client of the expert who classifies in charge of the specialized visual attribute Only image data for learning in which the type of object to which the specialized visual attribute is applied appears is transmitted.

즉, 일 실시예에서, 획득한 대상체 유형정보를 기초로 특화 시각속성 담당 전문가 클라이언트에는, 해당 특화 시각속성이 적용될 수 있는 유형의 대상체가 등장하는 학습용 영상데이터만 전송하여 불필요한 데이터 전송을 최소화함으로써 학습용 데이터셋 구축의 효율을 향상시킬 수 있다.That is, in one embodiment, based on the acquired object type information, only image data for training in which an object of a type to which the specialized visual attribute can be applied appears is transmitted to the expert client in charge of the specialized visual attribute, thereby minimizing unnecessary data transmission. The efficiency of data set construction can be improved.

예를 들어, 일 실시예에서, '색상'과 같이 대상체 유형에 무관하게 적용되는 범용 시각속성의 경우, 서버가 획득한 모든 의류유형 학습용 영상데이터를 '색상' 담당 전문가 클라이언트에 전송한다. 이와 달리, '목모양(Neckline)'은 '상의' 유형의 대상체에만 적용되는 특화 시각속성이므로, 서버는 획득한 유형정보를 기초로 '상의'가 등장하는 의류유형 학습용 영상데이터만 '목모양' 담당 전문가 클라이언트에 전송함으로써, 불필요한 학습용 영상데이터의 전송을 줄여 학습용 데이터셋 구축의 효율을 향상시킬 수 있다. 이상으로 의류 영상 내의 의류 대상체를 예시로 특화 및 범용 시각속성을 설명하였으나, 본 발명은 이에 제한되지 않으며 수술 영상, 운동 영상 등 다양한 영상유형의 영상데이터에 등장하는 다양한 유형의 대상체(수술 도구, 장기, 운동 기구, 손 등)에 적용될 수 있음은 물론이다.For example, in the case of a general-purpose visual attribute that is applied regardless of the type of object, such as 'color', in one embodiment, the server transmits all acquired image data for clothing type learning to the expert client in charge of 'color'. On the other hand, since 'neckline' is a specialized visual attribute that is applied only to the 'top' type object, the server uses only the image data for learning the clothing type in which 'top' appears based on the acquired type information. By transmitting to the expert client in charge, it is possible to reduce the transmission of unnecessary training image data and improve the efficiency of building a training dataset. Although the specialized and general-purpose visual properties have been described using the clothing object in the clothing image as an example, the present invention is not limited thereto, and various types of objects (surgery tools, organs, etc. , exercise equipment, hands, etc.) can be applied, of course.

이어서, 서버가 상기 전문가 클라이언트로부터 각각의 학습용 영상데이터의 시각속성에 대한 개별특성을 수신한다(S300).Next, the server receives individual characteristics of the visual attributes of each image data for learning from the expert client (S300).

일 실시예에서, 서버가 특정 시각속성 담당 전문가 클라이언트로부터 해당 시각속성에 대한 학습용 영상데이터의 개별특성을 수신한다. In an embodiment, the server receives individual characteristics of the image data for learning for the corresponding visual attribute from the expert client in charge of the specific visual attribute.

예를 들어, 서버가 상의에 대한 특정 학습용 영상데이터를 복수의 전문가 클라이언트에 전송한 경우, '대상체 유형' 담당 전문가 클라이언트로부터 '의류' 개별특성을, '의류 유형' 담당 전문가 클라이언트로부터 '셔츠' 개별특성을, '소매 길이' 담당 전문가 클라이언트로부터 '민소매' 개별특성을 각각 수신할 수 있다.For example, when the server transmits specific training image data for a top to a plurality of expert clients, individual characteristics of 'clothes' from the expert client in charge of 'object type' and 'shirt' individual characteristics from the expert client in charge of 'clothing type' Characteristics and 'sleeve length' individual characteristics may be received from the expert client responsible for 'sleeve length', respectively.

일 실시예에서, 상기 개별특성 수신 단계(S300)는, 시각속성의 특성에 따라 영상데이터에 등장하는 각각의 대상체에 대하여 각각 개별특성을 수신하거나, 복수의 대상체에 대한 개별특성을 수신할 수 있다. 즉, 하나의 학습용 영상데이터에 대하여 적용되는 하나의 시각속성에 있어서, 시각속성의 특성에 따라 하나 또는 복수의 개별특성이 획득될 수 있다.In an embodiment, in the individual characteristic receiving step ( S300 ), individual characteristics may be received for each object appearing in the image data, or individual characteristics of a plurality of objects may be received according to the characteristics of the visual attribute. . That is, in one visual attribute applied to one learning image data, one or a plurality of individual characteristics may be obtained according to the characteristic of the visual attribute.

일 실시예에서, 상기 개별특성 수신 단계(S300)는, 서버가 세부 시각속성 담당 전문가 클라이언트로부터 해당 세부 시각속성에 대한 학습용 영상데이터의 세부 개별특성을 수신하는 것일 수 있다. 즉, 특정한 영상유형에 대하여 설정되는 시각속성조합은 세부 시각속성들로 조합되는 것이고, 이에 따라 세부 시각속성 담당 전문가 클라이언트에만 학습용 영상데이터를 전송하는 것일 수 있다.In an embodiment, the individual characteristic receiving step ( S300 ) may be that the server receives the detailed individual characteristic of the image data for learning for the detailed visual attribute from the expert client in charge of the detailed visual attribute. That is, the visual attribute combination set for a specific image type is combined with detailed visual attributes, and accordingly, the image data for learning may be transmitted only to the expert client in charge of the detailed visual attribute.

상기 세부 시각속성은, 각 시각속성에 속하는 최하위의 시각속성을 의미한다. 예를 들어, '의류(대상체 유형)-셔츠(의류 유형)-민소매(소매 길이)'의 경우 '소매 길이'가 세부 시각속성이고, '민소매'가 세부 개별특성이 될 수 있다. The detailed visual attribute means the lowest visual attribute belonging to each visual attribute. For example, in the case of 'clothing (object type)-shirt (clothing type)-sleeveless (sleeve length)', 'sleeve length' may be a detailed visual attribute, and 'sleeveless' may be a detailed individual characteristic.

하위 시각속성은 상위 시각속성과의 관계에서 상대적으로 하위 시각속성인 것이다. 일 실시예에서, 상기 시각속성 간의 관계는 서버에 의해 설정되거나 변경될 수 있다. 또한, 하나의 상위 시각속성에 대하여 하나 이상의 하위 시각속성이 존재할 수 있다.The lower visual attribute is a relatively lower visual attribute in relation to the upper visual attribute. In an embodiment, the relationship between the visual attributes may be set or changed by the server. Also, one or more lower visual attributes may exist for one upper visual attribute.

일 실시예에서, 세부 개별특성은 상기 세부 개별특성이 속한 세부 시각속성의 하나 이상의 상위 시각속성에 대한 개별특성 정보를 포함할 수 있다.In an embodiment, the detailed individual characteristic may include individual characteristic information on one or more upper visual attributes of the detailed visual attribute to which the detailed individual characteristic belongs.

예를 들어, 세부 개별특성인 '민소매'는, 세부 시각속성인 '소매 길이'의 상위 시각속성의 개별특성인 '의류' 또는 '상의'에 대한 정보를 포함할 수 있다. 즉, 서버가 세부 개별특성인 '민소매'만 수신하더라도 상위 시각속성에 대한 개별특성인 '의류' 또는 '상의'의 정보를 획득할 수 있다. For example, 'sleeveless', which is a detailed individual characteristic, may include information about 'clothes' or 'top', which is an individual characteristic of an upper visual attribute of 'sleeve length', which is a detailed visual attribute. That is, even if the server receives only 'sleeveless', which is a detailed individual characteristic, information on 'clothes' or 'top', which is an individual characteristic for an upper visual attribute, can be obtained.

일 실시예에서, 세부 시각속성은 상기 세부 시각속성의 하위 시각속성이 추가됨에 따라 변경될 수 있다. 예를 들어, '의류 유형' 시각속성의 하위 시각속성으로 '목모양'이 추가 설정된 경우('대상체 유형(의류)-의류 유형(셔츠)-{목모양(V넥), 소매 길이(민소매)}'), '목모양' 및 '소매 길이'가 세부 시각속성이 될 수 있다. 또한, 세부시각속성을 포함하는 상기 시각속성은 서버에 의하여 자유롭게 설정 및 변경될 수 있다.In an embodiment, the detailed visual attribute may be changed as a lower visual attribute of the detailed visual attribute is added. For example, when 'neck shape' is additionally set as a sub-visual property of the 'clothing type' visual attribute ('object type (clothing)-clothing type (shirt)-{neck shape (V-neck), sleeve length (sleeveless) }'), 'neck shape' and 'sleeve length' may be detailed visual attributes. In addition, the visual attribute including the detailed visual attribute can be freely set and changed by the server.

즉, 서버가 상위 시각속성을 제외한 복수의 세부 시각속성 담당 전문가 클라이언트에만 학습용 영상데이터를 전송하여 세부 개별특성을 수신함에 따라, 최소한의 전문가 클라이언트에 학습용 영상데이터를 전송함으로써, 학습용 데이터셋 구축 효율을 높일 수 있는 효과가 있다. That is, as the server transmits training image data only to the expert clients in charge of a plurality of detailed visual attributes except for the upper visual attributes and receives detailed individual characteristics, by transmitting the training image data to the minimum expert clients, the efficiency of building a training dataset is improved. has the effect of increasing it.

예를 들어, 특정 학습용 영상데이터를 세부 시각속성인 '목모양' 및 '소매길이' 담당 전문가 클라이언트에만 전송하여 세부 개별특성으로 'V넥' 및 '민소매'를 수신하는 경우, 상기 세부 개별특성은 상위 시각속성의 개별특성인 '의류' 및 '상의'에 대한 정보를 포함하므로, 충분한 시각특성의 수집이 가능하다.For example, if specific training image data is transmitted only to the expert client in charge of 'neck shape' and 'sleeve length', which are detailed visual attributes, and 'V-neck' and 'sleeveless' are received as detailed individual characteristics, the detailed individual characteristics are Since it includes information on 'clothes' and 'tops', which are individual characteristics of upper visual attributes, it is possible to collect sufficient visual characteristics.

이어서, 상기 서버가 각각의 학습용 영상데이터에 대해 수신한 복수의 개별특성을 기초로 학습용 데이터셋을 구축한다(S400).Next, the server builds a training dataset based on a plurality of individual characteristics received for each training image data (S400).

일 실시예에서, 상기 학습용 데이터셋은, 학습용 영상데이터에 레이블링된 복수의 시각속성에 따른 개별특성이 매칭된 것으로, 시각특성 인식모델(100)을 트레이닝하기 위한 것이다.In one embodiment, the training dataset is for training the visual characteristic recognition model 100 , in which individual characteristics according to a plurality of visual attributes labeled in the training image data are matched.

또한, 일 실시예에서, 상기 학습용 데이터셋은 각각의 개별속성 인식모듈(110)을 트레이닝하기 위한 복수의 개별 학습용 데이터셋을 포함할 수 있다. 상기 개별 학습용 데이터셋은, 학습용 영상데이터에 레이블링된 특정한 시각속성(트레이닝하기 위한 개별속성 인식모듈이 판단하는 시각속성)의 개별특성이 매칭된 것이다.Also, in one embodiment, the training dataset may include a plurality of individual training datasets for training each individual attribute recognition module 110 . In the individual training dataset, individual characteristics of a specific visual attribute (visual attribute determined by the individual attribute recognition module for training) labeled in the image data for training are matched.

구체적으로, 서버가 A 시각속성에 대한 개별특성인식모듈(110)을 트레이닝하려는 경우, 학습용 데이터셋에서 A 시각속성에 대한 개별 학습용 데이터셋(각각의 학습용 영상데이터에 A 시각속성의 개별특성만이 매칭된 데이터셋)을 추출하여 딥러닝 학습모델에 입력한다. 이를 통해, 서버는 각각의 시각속성의 개별특성을 인식할 수 있는 각각의 개별속성 인식모듈(110)을 트레이닝하여 구축할 수 있다.Specifically, when the server intends to train the individual characteristic recognition module 110 for the A visual attribute, the individual learning dataset for the A visual attribute in the training dataset (only the individual characteristics of the A visual attribute in each training image data are The matched dataset) is extracted and input to the deep learning learning model. Through this, the server can train and build each individual attribute recognition module 110 capable of recognizing individual characteristics of each visual attribute.

또한, 일 실시예예서, 복수의 대상체의 시각적 특징을 서술하기 위한 특정한 분류기준인 시각속성의 경우, 상기 시각속성에 대한 개별 학습용 데이터셋은 학습용 영상데이터에 복수의 개별특성이 매칭된 것일 수 있다. 예를 들어, '수술 동작' 시각속성에 대한 개별 학습용 데이터셋은, 학습용 영상데이터에 '수술 도구' 및 '장기(수술 대상)' 시각속성에 대한 개별특성이 동시에 매칭된 것일 수 있다.In addition, in an embodiment, in the case of a visual attribute that is a specific classification criterion for describing the visual characteristics of a plurality of objects, the individual training dataset for the visual attribute may be one in which a plurality of individual characteristics are matched to the training image data. . For example, the individual training dataset for the 'surgery motion' visual attribute may be one in which individual characteristics for the 'surgical tool' and 'organ (surgery target)' visual attributes are matched to the training image data at the same time.

일 실시예에서, 특정한 시각속성에 대한 개별 학습용 데이터셋의 양이 부족한 경우, 상기 시각속성이 적용되는 영상유형의 학습용 영상데이터를 추가적으로 획득하여 상기 시각속성 담당 전문가 클라이언트에 전송 및 개별특성을 수신함으로써, 상기 시각속성에 대한 개별 학습용 데이터셋을 구축할 수 있다.In one embodiment, when the amount of the individual training dataset for a specific visual attribute is insufficient, by additionally acquiring training image data of the image type to which the visual attribute is applied, transmitting it to the expert client in charge of the visual attribute, and receiving the individual characteristic , it is possible to construct a dataset for individual learning for the visual attribute.

다른 일 실시예에서, 상기 학습용 데이터셋은, 학습용 영상데이터에 대해 수신한 복수의 개별특성을 기초로 외형서술데이터를 생성하여 학습용 데이터셋을 구축되는 것일 수 있다.In another embodiment, the training dataset may be to construct a training dataset by generating external description data based on a plurality of individual characteristics received with respect to the training image data.

즉, 특정한 학습용 영상데이터에 대해 수신한 복수의 시각속성에 대한 개별특성을 조합하여 외형서술데이터를 생성함으로써 학습용 데이터셋을 구축하되, 특정한 개별속성 인식모듈을 트레이닝하려는 경우 외형서술데이터로부터 특정한 개별 학습용 데이터셋을 추출하는 것이다.That is, a training dataset is constructed by generating external description data by combining individual characteristics of a plurality of visual attributes received for specific training image data. to extract the dataset.

구체적으로, 서버가 A 시각속성에 대한 개별속성 인식모듈을 트레이닝하려는 경우, 학습용 데이터셋(각각의 학습용 영상데이터에 대하여 A, B, C 시각속성에 대한 복수의 개별특성을 조합한 외형서술데이터)에서 학습용 영상데이터에 A 시각속성의 개별특성만이 매칭된 개별 학습용 데이터셋을 추출하여 개별속성 인식모듈을 트레이닝할 수 있다.Specifically, when the server intends to train the individual attribute recognition module for visual attribute A, a training dataset (appearance descriptive data combining a plurality of individual characteristics for visual attributes A, B, and C for each training image data) It is possible to train the individual attribute recognition module by extracting the individual learning dataset in which only the individual characteristics of the visual attribute A are matched to the learning image data.

이하, 본 발명의 일 실시예에 따른 외형서술데이터에 대하여 의류 영상을 예시로 자세히 설명한다.Hereinafter, a clothing image will be described in detail with respect to the appearance description data according to an embodiment of the present invention.

일 실시예에서, 상기 외형서술데이터는, 상기 학습용 영상데이터에 대한 복수의 개별특성을 조합한 것을 포함한다. 예를 들어, “셔츠(의류 유형), 밝은 분홍색(색상), 꽃무늬(패턴), 슬림(Top Silhouette), V넥(목모양), 민소매(소매 길이)”와 같이 조합될 수 있다. 그러나, 상기 외형서술데이터의 형태는 이에 제한되지 않고 획득한 개별특성을 조합한 모든 형태를 포함한다.In an embodiment, the outline description data includes a combination of a plurality of individual characteristics of the image data for training. For example, "shirt (type of clothing), light pink (color), floral (pattern), slim (Top Silhouette), V-neck (neck shape), sleeveless (sleeve length)" can be combined. However, the form of the outline description data is not limited thereto, and includes all forms in which the obtained individual characteristics are combined.

일 실시예에서, 상기 외형서술데이터는, 상기 학습용 영상데이터에 대한 복수의 개별특성에 대응하는 코드값을 추출하여 조합한 코드열 형태의 외형서술데이터를 포함한다. 즉, 개별특성을 코드화함에 따라 외형서술데이터를 코드열 형태로 생성할 수 있고, 이를 통해 외형서술데이터의 처리가 효율적으로 될 수 있다.In an embodiment, the outline description data includes outline description data in the form of a code string obtained by extracting and combining code values corresponding to a plurality of individual characteristics of the image data for training. That is, as individual characteristics are coded, the outline description data can be generated in the form of a code string, and through this, the outline description data can be processed efficiently.

예를 들어, 획득한 개별특성에 대응하는 코드값이 ''셔츠-Zb01, 밝은 분홍색-Ob01, 꽃무늬-Ie01, 슬림-Ba01, V넥-Bb02, 민소매-Bg01”인 경우, 코드열 형태의 외형서술데이터는 “Ba01, Bb02, Bg01, Ie01, Ob01, Zb01”일 수 있다.For example, if the code value corresponding to the acquired individual characteristic is ''shirt-Zb01, light pink-Ob01, floral pattern-Ie01, slim-Ba01, V-neck-Bb02, sleeveless-Bg01”, the The outline description data may be “Ba01, Bb02, Bg01, Ie01, Ob01, Zb01”.

도 2를 참조하면, 일 실시예에서, 서버가 전문가 클라이언트로부터 특정 학습용 영상데이터에 대한 복수의 개별특성을 수신하고, 상기 개별특성에 대응하는 코드값을 추출하여 조합함으로써 코드열 형태의 외형서술데이터를 생성할 수 있다.Referring to FIG. 2 , in one embodiment, the server receives a plurality of individual characteristics for specific training image data from an expert client, and extracts and combines code values corresponding to the individual characteristics to form external description data in the form of a code string. can create

다른 실시예에서, 서버가 전문가 클라이언트로부터 특정 학습용 영상데이터에 대한 복수의 개별특성에 대응하는 코드값을 수신하고, 상기 복수의 코드값을 조합함으로써 코드열 형태의 외형서술데이터를 생성할 수 있다.In another embodiment, the server may receive code values corresponding to a plurality of individual characteristics of specific training image data from the expert client, and may generate the outline description data in the form of a code string by combining the plurality of code values.

또한, 일 실시예에서, 하위 개별특성에 대응하는 코드값은, 상기 하위 개별특성이 속한 하위 시각속성의 하나 이상의 상위 시각속성에 대한 상위 개별특성 정보를 포함한다.Also, in an embodiment, the code value corresponding to the lower individual characteristic includes upper individual characteristic information on one or more upper visual attributes of the lower visual attribute to which the lower individual characteristic belongs.

예를 들어, 'V넥'에 대응하는 코드값 'Bb02'에서, “B”는 상위 시각속성(대상체 유형)의 개별특성인 “의류”에 대한 정보를 포함할 수 있다.For example, in the code value 'Bb02' corresponding to 'V-neck', “B” may include information on “clothing,” which is an individual characteristic of a higher visual attribute (object type).

도 4는 본 발명의 일 실시예에 따른 신규 시각속성이 추가된 경우의 데이터셋 구축 방법의 순서도이다. 도 4를 참조하면, 일 실시예에서, 신규 시각속성이 추가되는 경우, 상기 서버가 상기 학습용 데이터셋에 포함된 복수의 학습용 영상데이터를 추가된 신규 시각속성을 담당하여 분류하는 전문가의 클라이언트에 전송하는, 학습용 영상데이터 추가 전송단계(S500); 상기 서버가 상기 전문가 클라이언트로부터 각각의 학습용 영상데이터의 신규 시각속성에 대한 신규 개별특성을 수신하는 단계(S600); 및 상기 서버가 수신한 복수의 신규 개별특성에 대응하는 코드값을 추출하여 각각의 학습용 영상데이터의 외형서술데이터에 추가하는 단계(S700)를 더 포함한다.4 is a flowchart of a data set construction method when a new visual attribute is added according to an embodiment of the present invention. Referring to FIG. 4 , in one embodiment, when a new visual attribute is added, the server transmits a plurality of training image data included in the training dataset to a client of an expert who is in charge of classifying the added new visual attribute A further transmission step of video data for training (S500); receiving, by the server, a new individual characteristic for each new visual attribute of each training image data from the expert client (S600); and extracting code values corresponding to the plurality of new individual characteristics received by the server and adding the extracted code values to the outline description data of each image data for learning (S700).

서버가 학습용 데이터셋에 포함된 복수의 학습용 영상데이터를 추가된 신규 시각속성을 담당하여 분류하는 전문가의 클라이언트에 전송한다(S500). 즉, 일 실시예에서, 신규 시각속성이 추가 설정되어 신규 개별속성 인식모듈 트레이닝을 위한 신규 개별 학습용 데이터셋이 필요한 경우, 기 구축된 학습용 데이터셋에 포함된 학습용 영상데이터를 신규 시각속성 담당 전문가 클라이언트에 전송할 수 있다. 이와 달리, 새롭게 획득한 학습용 영상데이터를 전송하는 것도 가능하다.The server transmits the plurality of training image data included in the training dataset to the expert client who is in charge of classifying the added new visual attribute (S500). That is, in one embodiment, when a new visual attribute is additionally set and a new individual learning dataset for training a new individual attribute recognition module is required, the new visual attribute expert client can be sent to Alternatively, it is also possible to transmit newly acquired learning image data.

일 실시예에서, 상기 S500단계는, 신규 시각속성이 적용되는 영상유형의 학습용 영상데이터만 전문가 클라이언트에 전송하는 것이다. 즉, 특정한 영상유형에 대하여 추가적으로 획득하고자 하는 시각적 특징에 대한 신규 시각속성이 추가된 경우 해당 영상유형의 영상데이터를 추출하여 전송하는 것이다.In one embodiment, the step S500 is to transmit only the training image data of the image type to which the new visual attribute is applied to the expert client. That is, when a new visual attribute for a visual characteristic to be additionally acquired for a specific image type is added, image data of the corresponding image type is extracted and transmitted.

예를 들어, 의류 유형 영상에 대하여 '어깨 모양(Shoulder)'의 시각속성에 대한 개별특성을 더 획득하고자 하는 경우, 학습용 데이터셋에 포함된 복수의 학습용 영상데이터 중 의류 유형의 학습용 영상데이터만 전송하고, 수술, 운동 영상은 전송하지 않음으로써 불필요한 데이터 전송을 최소화할 수 있다.For example, when it is desired to further acquire individual characteristics of the visual attribute of 'shoulder' with respect to an image of a clothing type, only image data for training of a clothing type is transmitted among a plurality of image data for training included in the training dataset. In addition, unnecessary data transmission can be minimized by not transmitting surgical and exercise images.

다른 실시예에서, 새롭게 추가된 신규 시각속성을 적용할 영상유형을 결정한는 단계를 더 포함할 수 있다. 즉, 신규 시각속성이 애초에 의도한 영상유형 외에 다른 영상유형에도 적용될 수 있는 경우, 적용할 영상유형을 결정하는 것이다.In another embodiment, the method may further include determining an image type to which a newly added new visual attribute is to be applied. That is, when the new visual attribute can be applied to an image type other than the originally intended image type, the image type to be applied is determined.

다른 실시예에서, 추가된 신규 시각속성이 하위 시각속성인 경우, 상기 학습용 영상데이터 추가 전송단계는, 서버가 학습용 데이터셋에 포함된 복수의 학습용 영상데이터 중, 추가된 신규 시각속성의 상위 시각속성의 개별특성에 대응하는 코드값을 포함하는 학습용 영상데이터에 대해서만 상기 추가된 신규 시각속성을 담당하여 분류하는 전문가의 클라이언트에 전송하는 것을 포함한다.In another embodiment, when the added new visual attribute is a lower visual attribute, the step of transmitting the additional image data for training includes, by the server, an upper visual attribute of the added new visual attribute among a plurality of image data for learning included in the training dataset. and transmitting only the image data for training including the code value corresponding to the individual characteristics of .

예를 들어, 추가된 신규 시각속성이 '어깨 모양(Shoulder)'이고, 이는 상위 시각속성의 상위 개별특성인 상의(의류 유형)에 대한 하위 시각속성이며, '상의' 유형의 대상체에만 적용 가능한 특화 시각속성인 경우, 기 구축된 학습용 데이터셋에 포함된 복수의 학습용 영상데이터 중에서, 상위 개별특성인 '상의'에 대응하는 코드값인 'Zb01~Zb03'을 포함하는 외형서술데이터를 갖는 학습용 영상데이터를 추출하고, 이에 대해서만 '어깨 모양' 담당 전문가 클라이언트에 전송함으로써 불필요한 데이터 전송을 최소화할 수 있다.For example, the added new visual attribute is 'Shoulder', which is a lower visual attribute for the upper individual characteristic of the upper visual attribute (clothing type), and is a specialization applicable only to the 'top' type object. In the case of the visual attribute, from among the plurality of training image data included in the pre-established training dataset, training image data having appearance descriptive data including 'Zb01 to Zb03', which is a code value corresponding to the upper individual characteristic 'top' By extracting and transmitting only this to the expert client in charge of 'shoulder shape', unnecessary data transmission can be minimized.

서버가 상기 전문가 클라이언트로부터 각 학습용 영상데이터의 신규 시각속성에 대한 신규 개별특성을 수신하고(S600), 서버가 수신한 복수의 신규 개별특성에 대응하는 코드값을 추출하여 각 학습용 영상데이터의 외형서술데이터에 추가한다(S700).The server receives new individual characteristics for new visual attributes of each training image data from the expert client (S600), and extracts code values corresponding to a plurality of new individual characteristics received by the server to describe the appearance of each training image data It is added to the data (S700).

일 실시예에서, 기 구축된 학습용 데이터셋에 포함된 복수의 학습용 영상데이터 각각의 외형서술데이터에 대해, 신규 개별특성에 대응하는 코드값만을 추가하여 간단하게 학습용 데이터셋을 업데이트할 수 있다.In an embodiment, the training dataset may be simply updated by adding only code values corresponding to new individual characteristics to the outline description data of each of the plurality of training image data included in the pre-established training dataset.

예를 들어, 기 구축된 학습용 데이터셋에 포함된 특정 학습용 영상데이터의 외형서술데이터가 “Ba01, Bb02, Bg01, Ie01, Ob01, Zb01”이고, 상기 학습용 영상데이터를 신규 시각속성인 '어깨 모양(Shoulder)' 담당 전문가 클라이언트에 전송하여 신규 개별특성 'plain shoulder'를 수신하고, 이에 대응하는 코드값이 'Bd01'인 경우, “Ba01, Bb02, Bd01, Bg01, Ie01, Ob01, Zb01”과 같이 해당 코드값만 기 존재하는 외형서술데이터에 추가함으로써 간단하게 학습용 데이터셋을 업데이트할 수 있다.For example, the outline descriptive data of specific training image data included in the established training dataset is “Ba01, Bb02, Bg01, Ie01, Ob01, Zb01”, and the training image data is set to a new visual attribute 'shoulder shape ( Shoulder)' is sent to the expert client to receive the new individual characteristic 'plain shoulder', and if the corresponding code value is 'Bd01', the corresponding code value is “Ba01, Bb02, Bd01, Bg01, Ie01, Ob01, Zb01”. By adding only the code value to the existing external description data, it is possible to update the training dataset simply.

일 실시예에서, 상기 영상데이터가 복수의 프레임을 포함하는 동영상데이터인 경우, 상기 학습용 데이터셋 구축 단계는, 상기 동영상 데이터 내의 각각의 프레임에 대해 개별특성을 획득하여 구축되는 것일 수 있다.In an embodiment, when the image data is moving picture data including a plurality of frames, the step of constructing the training dataset may be constructed by acquiring individual characteristics for each frame in the moving picture data.

구체적으로, 동영상데이터 내의 각각의 프레임에 대해 복수의 개별특성을 수신하고, 각 프레임에 대한 외형서술데이터를 순차적으로 나열하여 생성할 수 있다.Specifically, it is possible to receive a plurality of individual characteristics for each frame in the moving picture data, and to generate the outline description data for each frame sequentially.

또한, 일 실시예에서, 상기 동영상 데이터 내의 복수의 프레임에 대해 동적시각속성에 따른 동적개별특성을 획득하여 구축되는 것일 수 있다.Also, in an embodiment, it may be constructed by acquiring dynamic individual characteristics according to dynamic visual attributes for a plurality of frames in the moving picture data.

구체적으로, 수술 영상의 경우 '수술 도구', '장기'와 같이 정지된 상태에서의 대상체 외형에 대한 시각속성의 개별특성은 동영상 데이터 내의 각각의 프레임에 대하여 획득하되, '수술도구의 움직임'과 같이 대상체의 움직임을 서술하기 위한 동적시각속성의 동적개별특성은 특정 구간의 복수의 프레임에 대하여 획득함으로써 학습용 데이터셋을 구축하는 것이다.Specifically, in the case of a surgical image, individual characteristics of visual properties of the object appearance in a stationary state such as 'surgery tool' and 'organ' are acquired for each frame in the video data, but 'movement of the surgical tool' and Similarly, the dynamic individual characteristics of dynamic visual properties for describing the movement of an object are to construct a training dataset by acquiring a plurality of frames in a specific section.

본 발명의 또 다른 일실시예에 따른 시각특성 인식모델 학습용 데이터셋 구축 장치는, 하나 이상의 컴퓨터를 포함하고, 상기 언급된 학습용 데이터셋 구축 방법을 수행한다.An apparatus for constructing a dataset for learning a visual characteristic recognition model according to another embodiment of the present invention includes one or more computers, and performs the above-described method for constructing a dataset for learning.

이상에서 전술한 본 발명의 시각특성 인식모델 학습용 데이터셋 구축 방법은, 하드웨어인 컴퓨터와 결합되어 실행되기 위해 프로그램(또는 어플리케이션)으로 구현되어 매체에 저장될 수 있다.The method of constructing a dataset for learning a visual characteristic recognition model of the present invention described above may be implemented as a program (or application) and stored in a medium in order to be executed in combination with a computer, which is hardware.

상기 전술한 프로그램은, 상기 컴퓨터가 프로그램을 읽어 들여 프로그램으로 구현된 상기 방법들을 실행시키기 위하여, 상기 컴퓨터의 프로세서(CPU)가 상기 컴퓨터의 장치 인터페이스를 통해 읽힐 수 있는 C, C++, JAVA, 기계어 등의 컴퓨터 언어로 코드화된 코드(Code)를 포함할 수 있다. 이러한 코드는 상기 방법들을 실행하는 필요한 기능들을 정의한 함수 등과 관련된 기능적인 코드(Functional Code)를 포함할 수 있고, 상기 기능들을 상기 컴퓨터의 프로세서가 소정의 절차대로 실행시키는데 필요한 실행 절차 관련 제어 코드를 포함할 수 있다. 또한, 이러한 코드는 상기 기능들을 상기 컴퓨터의 프로세서가 실행시키는데 필요한 추가 정보나 미디어가 상기 컴퓨터의 내부 또는 외부 메모리의 어느 위치(주소 번지)에서 참조되어야 하는지에 대한 메모리 참조관련 코드를 더 포함할 수 있다. 또한, 상기 컴퓨터의 프로세서가 상기 기능들을 실행시키기 위하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 통신이 필요한 경우, 코드는 상기 컴퓨터의 통신 모듈을 이용하여 원격에 있는 어떠한 다른 컴퓨터나 서버(10) 등과 어떻게 통신해야 하는지, 통신 시 어떠한 정보나 미디어를 송수신해야 하는지 등에 대한 통신 관련 코드를 더 포함할 수 있다. The above-described program is C, C++, JAVA, machine language, etc. that a processor (CPU) of the computer can read through a device interface of the computer in order for the computer to read the program and execute the methods implemented as a program It may include code (Code) coded in the computer language of Such code may include functional code related to functions defining functions necessary for executing the methods, etc. can do. In addition, the code may further include additional information necessary for the processor of the computer to execute the functions or code related to memory reference for which location (address address) in the internal or external memory of the computer to be referenced. have. In addition, when the processor of the computer needs to communicate with any other computer or server at a remote location in order to execute the functions, the code is executed by using the communication module of the computer to execute the functions of the computer or server 10 at a remote location. ), etc., may further include a communication-related code for how to communicate, and what information or media to transmit and receive during communication.

상기 저장되는 매체는, 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상기 저장되는 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있지만, 이에 제한되지 않는다. 즉, 상기 프로그램은 상기 컴퓨터가 접속할 수 있는 다양한 서버(10) 상의 다양한 기록매체 또는 사용자의 상기 컴퓨터상의 다양한 기록매체에 저장될 수 있다. 또한, 상기 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장될 수 있다.The storage medium is not a medium that stores data for a short moment, such as a register, a cache, a memory, etc., but a medium that stores data semi-permanently and can be read by a device. Specifically, examples of the storage medium include, but are not limited to, ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage device. That is, the program may be stored in various recording media on the various servers 10 accessible by the computer or in various recording media on the computer of the user. In addition, the medium may be distributed in a computer system connected by a network, and a computer readable code may be stored in a distributed manner.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다.As mentioned above, although embodiments of the present invention have been described with reference to the accompanying drawings, those skilled in the art to which the present invention pertains can realize that the present invention can be embodied in other specific forms without changing its technical spirit or essential features. you will be able to understand Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

Claims

an image data acquisition step for learning, in which the server acquires a plurality of image data for learning including one or more objects;
Wherein the server transmits the training image data to a plurality of expert clients according to the image type of each training image data, wherein the expert client is a client of an expert who is in charge of classifying each visual attribute in the visual attribute combination, learning image data transmission step;
an individual characteristic receiving step of receiving, by the server, individual characteristics for visual attributes of each image data for learning from the expert client; and
The server generates external description data for each image data for learning based on a plurality of individual characteristics received for each image data for learning, and constructing a data set for learning,
The visual attribute is a specific classification criterion for describing the visual characteristics of the object, and includes a plurality of individual characteristics expressing various visual characteristics within the same classification criterion,
The visual attribute combination is a set of a plurality of visual attributes to be calculated for a specific image type,
The training dataset is used for learning one or more visual characteristic recognition models,
The visual characteristic recognition model is to be learned using a plurality of training image data included in the training dataset and one or more individual characteristics extracted from the appearance description data corresponding thereto, constructing a visual characteristic recognition model training dataset Way.

According to claim 1,
The visual characteristic recognition model includes a plurality of individual attribute recognition modules for determining different visual attributes,
The plurality of individual attribute recognition modules are learned based on an individual learning dataset corresponding to each individual attribute recognition module,
The individual learning dataset is a method of constructing a dataset for learning a visual characteristic recognition model in which individual characteristics for a specific visual attribute extracted from external description data are matched to a plurality of training image data.

According to claim 1,
The visual attribute combination is a method of constructing a dataset for learning a visual characteristic recognition model, characterized in that it is combined into a plurality of detailed visual attributes that are the lowest visual attributes belonging to each visual attribute.

According to claim 1,
The appearance description data is an outline description data in the form of a code string obtained by extracting and combining code values corresponding to a plurality of individual characteristics of the image data for learning, a method of constructing a dataset for learning a visual characteristic recognition model.

5. The method of claim 4,
The code value corresponding to the detailed individual characteristic includes individual characteristic information on one or more upper visual attributes of the detailed visual attribute.

According to claim 1,
When a new visual attribute is added,
an additional transmission step of training image data, in which the server transmits, by the server, a plurality of training image data included in the training dataset to a client of an expert who classifies the added new visual attribute in charge;
receiving, by the server, a new individual characteristic for each new visual attribute of each training image data from the expert client; and
Extracting code values corresponding to the plurality of new individual characteristics received by the server and adding them to the external description data of each image data for learning; further comprising, a method of constructing a dataset for learning a visual characteristic recognition model.

7. The method of claim 6,
The method further comprising the step of adding, by the server, the new visual attribute to a visual attribute combination for one or more image types;
The additional transmission step of the image data for training is,
A method for constructing a dataset for learning a visual characteristic recognition model, characterized in that only the image data for learning of the image type to which the new visual attribute is added is transmitted.

According to claim 1,
The visual attributes include dynamic visual attributes for describing the movement of a specific object,
When the video data is video data including a plurality of frames,
The step of constructing the training dataset is,
A method for constructing a data set for learning a visual characteristic recognition model, characterized in that it is constructed by acquiring individual characteristics for each frame in the moving picture data and acquiring dynamic individual characteristics for a plurality of frames.

A data set construction server device for learning a visual characteristic recognition model, comprising one or more computers, and executing the method of any one of claims 1 to 8.

Combined with a computer that is hardware, and stored in a medium to execute the method of any one of claims 1 to 8, a data set construction program for learning a visual characteristic recognition model.