KR102187741B1

KR102187741B1 - Metadata crowd sourcing system and method

Info

Publication number: KR102187741B1
Application number: KR1020180148540A
Authority: KR
Inventors: 강현수; 신형욱; 변정
Original assignee: 주식회사 코난테크놀로지
Priority date: 2018-11-27
Filing date: 2018-11-27
Publication date: 2020-12-07
Also published as: KR20200062761A

Abstract

고품질의 메타데이터 구축을 위한 메타데이터 크라우드 소싱 시스템 및 그 방법이 개시된다. 일 실시예에 따른 메타데이터 크라우드 소싱 시스템은 비디오 데이터에서 장면 구간별로 이미지를 추출하고 각 이미지에 대한 메타데이터를 추출하여 저장 관리하는 비디오 메타데이터 태깅부; 상기 비디오 메타데이터 태깅부에 저장된 이미지와 메타데이터를 가지고 문제를 생성하고 생성된 문제에 대한 답을 사용자로부터 입력받아 사용자의 신뢰도와 문제유형의 정확도를 계산하며, 이를 통해 메타데이터의 정확도를 누적 관리하며 그에 따라 메타데이터를 갱신하는 크라우드 소싱 처리부; 상기 크라우드 소싱 처리부에서 생성된 문제를 불특정 다수의 사용자에게 제시하고 그에 대한 답을 입력받는 서비스 인터페이스 제공부; 및 상기 메타데이터 정확도에 따라 상기 비디오 메타데이터 태깅부에 저장된 메타데이터를 갱신하고 최적화하는 메타데이터 학습모델 최적화부를 포함한다.A metadata crowdsourcing system and method for constructing high quality metadata are disclosed. A metadata crowdsourcing system according to an embodiment includes: a video metadata tagging unit that extracts images for each scene section from video data, and extracts and stores metadata for each image; Generates a problem with the image and metadata stored in the video metadata tagging unit, receives an answer to the generated problem from the user, calculates the user's reliability and accuracy of the problem type, and accumulates and manages the accuracy of the metadata. And a crowdsourcing processor for updating metadata accordingly; A service interface providing unit for presenting a problem generated by the crowdsourcing processing unit to a plurality of unspecified users and receiving an answer to the problem; And a metadata learning model optimization unit updating and optimizing metadata stored in the video metadata tagging unit according to the metadata accuracy.

Description

Metadata crowd sourcing system and method}

본 발명은 고품질의 메타데이터 구축을 위한 메타데이터 크라우드 소싱 시스템 및 그 방법에 관한 것이다.The present invention relates to a metadata crowdsourcing system and method for constructing high quality metadata.

딥러닝 기술의 발전으로 다양한 분야에서 딥러닝을 적용한 연구와 서비스가 진행되고 있고 특히 딥러닝을 활용한 이미지 인식분야는 딥러닝 이전의 성능과 비교하여 크게 향상되고 있으며 계속 발전 중이다.With the development of deep learning technology, research and services applying deep learning are in progress in various fields. In particular, the image recognition field using deep learning is greatly improved compared to the performance before deep learning, and is continuously developing.

그러나 일반적으로 공개된 딥러닝 모델을 적용하여 자동으로 추출한 메타데이터 정보는 잘못된 태깅, 누락된 정보 등의 오류로 인해 수정 없이는 활용하기 어렵다. 따라서 고품질의 메타데이터 구축을 위해서는 사람이 수동으로 만드는 것이 일반적이지만, 이를 위해서는 많은 인력과 시간, 그리고 비용이 소요된다.However, metadata information automatically extracted by applying a publicly disclosed deep learning model is difficult to utilize without modification due to errors such as incorrect tagging and missing information. Therefore, it is common to manually create high-quality metadata by humans, but this requires a lot of manpower, time, and cost.

한편, 크라우드소싱(crowdsourcing)은 불특정 다수의 사람들의 참여를 통해 개인이나 단체 또는 기업이 필요로 하는 아이디어를 고안하고 문제를 해결하는 방식이다. 크라우드소싱은 다양한 학문, 산업, 사회, 경제분야에서 활용되어 왔으며, 그 중 reCaptcha v1, reCaptcha v2, reCaptcha v3, 및 Amazon Mechanical Turk 등이 대표적인 크라우드소싱을 활용한 메타데이터 태깅 도구이다.On the other hand, crowdsourcing is a method of devising ideas and solving problems that individuals, groups, or companies need through the participation of an unspecified number of people. Crowdsourcing has been used in various academic, industrial, social, and economic fields, and among them, reCaptcha v1, reCaptcha v2, reCaptcha v3, and Amazon Mechanical Turk are representative metadata tagging tools using crowdsourcing.

그러나 이러한 크라우드 소싱을 활용한 메타데이터 태깅기술은 인간이 인지할 수 없는 문제가 출제되거나 저작권 문제가 발생하는 등 여전히 여러가지 문제점을 갖고 있다.However, the metadata tagging technology using such crowdsourcing still has various problems, such as problems that humans cannot recognize or copyright problems.

일 실시예에 따라, 자동으로 태깅된 메타데이터를 집단지성을 활용하여 고품질의 메타데이터로 변환하는 메타데이터 크라우드 소싱 시스템 및 그 방법을 제안한다.According to an embodiment, a metadata crowdsourcing system and method for converting automatically tagged metadata into high-quality metadata using collective intelligence are proposed.

일 실시예에 따른 메타데이터 크라우드 소싱 시스템은 비디오 데이터에서 장면 구간별로 이미지를 추출하고 각 이미지에 대한 메타데이터를 추출하여 저장 관리하는 비디오 메타데이터 태깅부; 상기 비디오 메타데이터 태깅부에 저장된 이미지와 메타데이터를 가지고 문제를 생성하고 생성된 문제에 대한 답을 사용자로부터 입력받아 사용자의 신뢰도와 문제유형의 정확도를 계산하며, 이를 통해 메타데이터의 정확도를 누적 관리하며 그에 따라 메타데이터를 갱신하는 크라우드 소싱 처리부; 상기 크라우드 소싱 처리부에서 생성된 문제를 불특정 다수의 사용자에게 제시하고 그에 대한 답을 입력받는 서비스 인터페이스 제공부; 및 상기 메타데이터 정확도에 따라 상기 비디오 메타데이터 태깅부에 저장된 메타데이터를 갱신하고 최적화하는 메타데이터 학습모델 최적화부를 포함한다.A metadata crowdsourcing system according to an embodiment includes: a video metadata tagging unit that extracts images for each scene section from video data, and extracts and stores metadata for each image; Generates a problem with the image and metadata stored in the video metadata tagging unit, receives an answer to the generated problem from the user, calculates the user's reliability and accuracy of the problem type, and accumulates and manages the accuracy of the metadata. And a crowdsourcing processor for updating metadata accordingly; A service interface providing unit for presenting a problem generated by the crowdsourcing processing unit to a plurality of unspecified users and receiving an answer to the problem; And a metadata learning model optimization unit updating and optimizing metadata stored in the video metadata tagging unit according to the metadata accuracy.

비디오 메타데이터 태깅부는, 상기 비디오 데이터의 등록, 삭제 및 장면 단위의 비디오 구간 추출을 포함하는 미디어 관리 및 추출된 장면 구간별 이미지에서 객체인식, 배경인식, 얼굴인식 및 캡션생성을 포함하는 메타데이터를 추출을 수행할 수 있다.The video metadata tagging unit includes media management including registration and deletion of the video data, and extraction of video sections for each scene, and metadata including object recognition, background recognition, face recognition, and caption generation from the extracted images for each scene section. Extraction can be performed.

서비스 인터페이스 제공부는, 자바 스크립트로 구현되어 팝업 창을 통해 문제를 제시하고 답을 입력받을 수 있고, 캐스캐이딩 스타일 시트(Cascading Style Sheet)로 사용자 인터페이스의 디자인이 제공될 수 있다.The service interface providing unit may be implemented as a Java script to present a problem and receive an answer through a pop-up window, and a design of a user interface may be provided as a cascading style sheet.

사용자 신뢰도는 상기 제시된 문제에 대한 사용자 입력값 중 높은 신뢰도의 메타데이터의 선택정도에 기초하여 결정되며, 문제유형 정확도는 사용자 입력값 중 낮은 신뢰도의 메타데이터의 선택정도에 기초하여 결정될 수 있다.User reliability is determined based on a selection degree of metadata having a high reliability among user input values for the presented problem, and the problem type accuracy may be determined based on a selection degree of metadata having a low reliability among user input values.

메타데이터 정확도는 상기 사용자 신뢰도와 상기 문제유형 정확도에 기존에 누적된 정답의 갯수와 해당 문제의 정답개수를 반영하여 계산될 수 있다.The metadata accuracy may be calculated by reflecting the number of previously accumulated correct answers and the number of correct answers to the corresponding question in the user reliability and the problem type accuracy.

또한, 학습모델 최적화부는, 전이학습 또는 액티브 러닝을 적용하여 상기 메타데이터를 최적화할 수 있다.In addition, the learning model optimization unit may optimize the metadata by applying transfer learning or active learning.

한편 다른 실시예에 따른 메타데이터 크라우드 소싱 방법은, 비디오 데이터에서 장면 구간별로 이미지를 추출하고 각 이미지에 대한 메타데이터를 추출하여 저장 관리하는 비디오 메타데이터 태깅 단계; 상기 저장된 이미지와 메타데이터를 가지고 문제를 생성하고 생성된 문제에 대한 답을 사용자로부터 입력받아 사용자의 신뢰도와 문제유형의 정확도를 계산하며, 이를 통해 메타데이터의 정확도를 누적 관리하며 그에 따라 메타데이터를 갱신하는 크라우드 소싱 처리 단계; 및 상기 메타데이터 정확도에 따라 상기 저장된 메타데이터를 갱신하고 최적화하는 메타데이터 학습모델 최적화 단계를 포함한다.Meanwhile, a metadata crowdsourcing method according to another embodiment includes: a video metadata tagging step of extracting an image for each scene section from video data, extracting metadata for each image, and storing and managing the metadata; Generates a problem with the stored image and metadata, receives the answer to the generated problem from the user, calculates the user's reliability and accuracy of the problem type, and accumulates and manages the accuracy of the metadata through this, and stores the metadata accordingly. A crowdsourcing process step of updating; And a metadata learning model optimization step of updating and optimizing the stored metadata according to the metadata accuracy.

비디오 메타데이터 태깅 단계는, 상기 비디오 데이터의 등록, 삭제 및 장면 단위의 비디오 구간 추출을 포함하는 미디어 관리 및 추출된 장면 구간별 이미지에서 객체인식, 배경인식, 얼굴인식 및 캡션생성을 포함하는 메타데이터를 추출을 수행할 수 있다.The video metadata tagging step includes media management including registration and deletion of the video data, and video section extraction for each scene, and metadata including object recognition, background recognition, face recognition, and caption generation in the extracted scene section images. You can perform the extraction.

또한 학습모델 최적화 단계는, 전이학습 또는 액티브 러닝을 적용하여 상기 메타데이터를 최적화할 수 있다.In addition, in the learning model optimization step, the metadata may be optimized by applying transfer learning or active learning.

본 발명에 따라 종래의 시스템이 갖는 인간이 인지할 수 없는 문제가 출제되는 현상을 개선하고 포괄적인 정답셋을 구축할 수 있다. 또한 모든 과정을 자동화 하여 어려운 준비 절차를 생략하고 새롭게 생성된 데이터들의 품질을 관리할 수 있다. 더 나아가 새롭게 생성된 데이터들을 활용하여 기존에 학습된 딥러닝 모델의 성능을 사람의 개입 없이 점차적으로 향상시킬 수 있다.According to the present invention, it is possible to improve the problem of problems that humans cannot recognize of the conventional system and to construct a comprehensive set of correct answers. In addition, by automating all processes, difficult preparation procedures can be omitted and the quality of newly created data can be managed. Furthermore, it is possible to gradually improve the performance of an existing deep learning model without human intervention by using newly created data.

또한 사용자의 참여로 생성된 고품질의 메타데이터를 활용하여 전이학습, 액티브러닝 기반의 학습모델 최적화 과정을 거쳐 자동으로 생성되는 메타데이터의 품질도 함께 향상시킬 수 있다.In addition, it is possible to improve the quality of metadata that is automatically generated through the process of optimizing learning models based on transfer learning and active learning by utilizing high-quality metadata generated by user participation.

도 1은 본 발명의 일 실시예에 따른 메타데이터 크라우드 소싱 시스템의 구성도,
도 2는 본 발명의 일 실시예에 따른 비디오 메타데이터 태깅부의 동작을 설명하기 위한 도면,
도 3는 비디오 메타데이터 태깅 결과의 일예를 도시한 도면,
도 4는 본 발명의 일 실시예에 따른 서비스 인터페이스 제공부에서 생성된 팝업 화면의 예를 도시한 도면,
도 5는 본 발명의 일 실시예에 따른 크라우드 소싱 처리부에서의 처리절차를 설명하기 위한 도면,
도 6은 본 발명의 일 실시예에 따른 메타데이터 학습모델 최적화부의 상세 구성 및 동작을 설명하기 위한 도면, 그리고
도 7은 본 발명의 일 실시예에 따른 메타데이터 크라우드 소싱 방법의 흐름도이다.1 is a configuration diagram of a metadata crowdsourcing system according to an embodiment of the present invention;
2 is a view for explaining the operation of a video metadata tagging unit according to an embodiment of the present invention;
3 is a diagram illustrating an example of a video metadata tagging result;
4 is a diagram illustrating an example of a pop-up screen generated by a service interface providing unit according to an embodiment of the present invention;
5 is a diagram for explaining a processing procedure in a crowd sourcing processing unit according to an embodiment of the present invention;
6 is a diagram for explaining a detailed configuration and operation of a metadata learning model optimization unit according to an embodiment of the present invention, and
7 is a flowchart of a metadata crowdsourcing method according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시 예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시 예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시 예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Advantages and features of the present invention, and a method of achieving them will become apparent with reference to the embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in a variety of different forms, only the present embodiments are intended to complete the disclosure of the present invention, and the general knowledge in the technical field to which the present invention pertains. It is provided to completely inform the scope of the invention to those who have it, and the invention is only defined by the scope of the claims. The same reference numerals refer to the same components throughout the specification.

본 발명의 실시 예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이며, 후술되는 용어들은 본 발명의 실시 예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In describing embodiments of the present invention, if it is determined that a detailed description of a known function or configuration may unnecessarily obscure the subject matter of the present invention, a detailed description thereof will be omitted, and terms to be described later are in the embodiments of the present invention. These terms are defined in consideration of the function of the user and may vary according to the intention or custom of users or operators. Therefore, the definition should be made based on the contents throughout this specification.

첨부된 블록도의 각 블록과 흐름도의 각 단계의 조합들은 컴퓨터 프로그램인스트럭션들(실행 엔진)에 의해 수행될 수도 있으며, 이들 컴퓨터 프로그램 인스트럭션들은 범용 컴퓨터, 특수용 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서에 탑재될 수 있으므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서를 통해 수행되는 그 인스트럭션들이 블록도의 각 블록 또는 흐름도의 각 단계에서 설명된 기능들을 수행하는 수단을 생성하게 된다.Combinations of each block of the attached block diagram and each step of the flowchart may be executed by computer program instructions (execution engine), and these computer program instructions are executed on a processor of a general purpose computer, special purpose computer or other programmable data processing equipment. As it may be mounted, its instructions executed by the processor of a computer or other programmable data processing equipment generate means for performing the functions described in each block of the block diagram or each step of the flowchart.

이들 컴퓨터 프로그램 인스트럭션들은 특정 방식으로 기능을 구현하기 위해 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 지향할 수 있는 컴퓨터 이용가능 또는 컴퓨터 판독 가능 메모리에 저장되는 것도 가능하므로, 그 컴퓨터 이용가능 또는 컴퓨터 판독 가능 메모리에 저장된 인스트럭션들은 블록도의 각 블록 또는 흐름도의 각 단계에서 설명된 기능을 수행하는 인스트럭션 수단을 내포하는 제조 품목을 생산하는 것도 가능하다.These computer program instructions may also be stored in computer-usable or computer-readable memory that can be directed to a computer or other programmable data processing equipment to implement a function in a particular manner, so that the computer-usable or computer-readable memory It is also possible to produce an article of manufacture containing instruction means for performing the functions described in each block of the block diagram or each step of the flow chart.

그리고 컴퓨터 프로그램 인스트럭션들은 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에 탑재되는 것도 가능하므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에서 일련의 동작 단계들이 수행되어 컴퓨터로 실행되는 프로세스를 생성해서 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 수행하는 인스트럭션들은 블록도의 각 블록 및 흐름도의 각 단계에서 설명되는 기능들을 실행하기 위한 단계들을 제공하는 것도 가능하다.In addition, since computer program instructions can be mounted on a computer or other programmable data processing equipment, a series of operation steps are performed on a computer or other programmable data processing equipment to create a computer-executable process. It is also possible for the instructions to perform the data processing equipment to provide steps for executing the functions described in each block of the block diagram and each step of the flowchart.

또한, 각 블록 또는 각 단계는 특정된 논리적 기능들을 실행하기 위한 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있으며, 몇 가지 대체 실시 예들에서는 블록들 또는 단계들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 블록들 또는 단계들은 사실 실질적으로 동시에 수행되는 것도 가능하며, 또한 그 블록들 또는 단계들이 필요에 따라 해당하는 기능의 역순으로 수행되는 것도 가능하다.In addition, each block or each step may represent a module, segment, or part of code containing one or more executable instructions for executing specified logical functions, and in some alternative embodiments mentioned in the blocks or steps. It should be noted that it is also possible for functions to occur out of order. For example, two blocks or steps shown in succession may in fact be performed substantially simultaneously, and the blocks or steps may be performed in the reverse order of a corresponding function as necessary.

이하, 첨부 도면을 참조하여 본 발명의 실시 예를 상세하게 설명한다. 그러나 다음에 예시하는 본 발명의 실시 예는 여러 가지 다른 형태로 변형될 수 있으며, 본 발명의 범위가 다음에 상술하는 실시 예에 한정되는 것은 아니다. 본 발명의 실시 예는 이 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위하여 제공된다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, the embodiments of the present invention exemplified below may be modified in various other forms, and the scope of the present invention is not limited to the embodiments described below. Embodiments of the present invention are provided to more completely describe the present invention to those of ordinary skill in the art.

도 1은 본 발명의 일 실시예에 따른 메타데이터 크라우드 소싱 시스템의 구성도이다.1 is a block diagram of a metadata crowdsourcing system according to an embodiment of the present invention.

메타데이터 크라우드 소싱 시스템은 비디오 메타데이터 태깅부(110), 크라우드 소싱 처리부(120), 서비스 인터페이스 제공부(130) 및 메타데이터 학습모델 최적화부(140)를 포함한다.The metadata crowdsourcing system includes a video metadata tagging unit 110, a crowdsourcing processing unit 120, a service interface providing unit 130, and a metadata learning model optimization unit 140.

비디오 메타데이터 태깅부(110)는 비디오 데이터에서 장면 구간별로 이미지를 추출하고 각 이미지에 대한 메타데이터를 추출하여 저장 관리한다. 이를 위해 비디오 데이터의 등록, 삭제 및 장면 단위의 비디오 구간 추출을 포함하는 미디어 관리 및 추출된 장면 구간별 이미지에서 객체인식, 배경인식, 얼굴인식 및 캡션생성을 포함하는 메타데이터를 추출을 수행한다.The video metadata tagging unit 110 extracts images for each scene section from video data, extracts metadata for each image, and stores and manages them. To this end, media management including registration and deletion of video data and extraction of video sections in units of scenes, and metadata including object recognition, background recognition, face recognition, and caption generation are performed from the extracted scene section images.

크라우드 소싱 처리부(120)는 비디오 메타데이터 태깅부(110)에 저장된 이미지와 메타데이터를 가지고 문제를 생성하고 생성된 문제에 대한 답을 사용자로부터 입력받아 사용자의 신뢰도와 문제유형의 정확도를 계산하며, 이를 통해 메타데이터의 정확도를 누적 관리하며 그에 따라 메타데이터를 갱신한다. 여기서 사용자 신뢰도는 제시된 문제에 대한 사용자 입력값 중 높은 신뢰도의 메타데이터의 선택정도에 기초하여 결정되며, 문제유형 정확도는 사용자 입력값 중 낮은 신뢰도의 메타데이터의 선택정도에 기초하여 결정될 수 있다. 또한 메타데이터 정확도는 사용자 신뢰도와 문제유형 정확도에 기존에 누적된 정답의 갯수와 해당 문제의 정답개수를 반영하여 계산될 수 있다.The crowdsourcing processing unit 120 generates a problem with the image and metadata stored in the video metadata tagging unit 110, receives an answer to the generated problem from the user, and calculates the reliability of the user and the accuracy of the problem type, Through this, the accuracy of the metadata is accumulated and managed, and the metadata is updated accordingly. Here, the user reliability is determined based on a selection degree of metadata having a high reliability among user input values for a presented problem, and the problem type accuracy may be determined based on a selection degree of metadata having a low reliability among user input values. In addition, the metadata accuracy may be calculated by reflecting the number of previously accumulated correct answers and the number of correct answers to the corresponding question in the user reliability and problem type accuracy.

서비스 인터페이스 제공부(130)는 크라우드 소싱 처리부(120)에서 생성된 문제를 불특정 다수의 사용자에게 제시하고 그에 대한 답을 입력받는다. 이를 위해 자바 스크립트로 구현되어 팝업 창을 통해 문제를 제시하고 답을 입력받을 수 있고, 캐스캐이딩 스타일 시트(Cascading Style Sheet)로 사용자 인터페이스의 디자인이 제공될 수 있다.The service interface providing unit 130 presents the problem generated by the crowdsourcing processing unit 120 to an unspecified number of users and receives an answer to the problem. To this end, it is implemented in Java script, so that problems can be presented and answers can be input through a pop-up window, and the design of the user interface can be provided with a cascading style sheet.

메타데이터 학습모델 최적화부(140)는 메타데이터 정확도에 따라 비디오 메타데이터 태깅부(110)에 저장된 메타데이터를 갱신하고 최적화한다. 일예로 전이학습 또는 액티브 러닝을 적용하여 메타데이터를 최적화할 수 있다. 이하 도 2 내지 도 7을 참조하여 각 구성부에 관해 상세히 설명한다.The metadata learning model optimization unit 140 updates and optimizes the metadata stored in the video metadata tagging unit 110 according to the metadata accuracy. For example, it is possible to optimize metadata by applying transfer learning or active learning. Hereinafter, each component will be described in detail with reference to FIGS. 2 to 7.

도 2는 본 발명의 일 실시예에 따른 비디오 메타데이터 태깅부의 동작을 설명하기 위한 도면이다.2 is a diagram illustrating an operation of a video metadata tagging unit according to an embodiment of the present invention.

비디오 메타데이터 태깅부(110)는 비디오의 등록 및 삭제, 장면 구간 추출을 포함하는 미디어 관리 기능과 비디오 동영상에서 이미지를 추출하여 객체인식, 배경인식, 얼굴인식, 캡션생성을 포함하는 메타데이터 추출 기능을 수행한다. 이때 비디오 메타데이터의 생성은 예를 들어, 딥러닝 기반 객체/배경/얼굴 인식 기술을 사용할 수 있고, 단일 이미지 기반 캡션 생성 기술을 사용할 수 있다.The video metadata tagging unit 110 includes a media management function including registration and deletion of a video, extraction of a scene section, and a metadata extraction function including object recognition, background recognition, face recognition, and caption generation by extracting an image from a video video. Perform. In this case, for the generation of video metadata, for example, a deep learning-based object/background/face recognition technology may be used, and a single image-based caption generation technology may be used.

동작 과정을 보다 상세하게 설명하면, 웹 기반으로 구성된 사용자 인터페이스를 통해 사용자가 비디오를 등록하면 비디오 메타데이터 태깅 서버(210)는 장면 구간 추출 서버(220)를 호출하여 영상에서 일예로 10 fps로 이미지를 추출하고, 추출된 이미지는 순차적으로 일예로 COCO data set으로 학습된 faster RCNN 기반의 객체인식 서버, MIT Places database으로 학습된 CNN 기반의 배경인식 서버, FACENET data set으로 학습된 CNN 기반의 얼굴 인식 서버, COCO data set으로 학습된 CNN과 RNN을 조합한 Attention LSTM 기반의 캡션생성 서버로 구성된 메타데이터 추출 서버(230)로 전송되어 메타데이터를 추출하여 데이터베이스(240)에 저장되고, 사용자에게 이미지별로 추출된 정보를 시각적으로 디스플레이한다.To explain the operation process in more detail, when a user registers a video through a web-based user interface, the video metadata tagging server 210 calls the scene section extraction server 220 to obtain an image at 10 fps from the video. And the extracted images sequentially, for example, a faster RCNN-based object recognition server learned with the COCO data set, a CNN-based background recognition server learned with the MIT Places database, and a CNN-based face recognition learned with the FACENET data set. The server is sent to the metadata extraction server 230 consisting of an Attention LSTM-based caption generation server that combines the CNN and RNN learned with the COCO data set, extracts the metadata, and stores it in the database 240. The extracted information is visually displayed.

도 3는 비디오 메타데이터 태깅 결과의 일예를 도시한 도면이다.3 is a diagram illustrating an example of a video metadata tagging result.

도 3을 참조하면, 장면 구간별로 리스트가 표시되어 있고, 각 구간별 대표 이미지에서의 태깅 결과를 인물, 객체, 장소 등으로 구분하여 디스플레이하고 있음을 알 수 있다.Referring to FIG. 3, it can be seen that a list is displayed for each scene section, and the tagging result in the representative image for each section is divided into persons, objects, places, etc. and displayed.

도 4는 본 발명의 일 실시예에 따른 서비스 인터페이스 제공부에서 생성된 팝업 화면의 예를 도시한 도면이다.4 is a diagram illustrating an example of a pop-up screen generated by a service interface providing unit according to an embodiment of the present invention.

도 4를 참조하면, JS+CSS 패키지가 적용된 팝업 화면의 일예이다. 일 실시예에 따른 본 발명의 고품질 메타데이터 구축을 위한 메타데이터 크라우드 소싱 시스템은 집단지성을 실현 가능하게 하는 사용자 인터페이스가 필요하다. 따라서 모바일, PC, 태블릿 등 다양한 플랫폼에서 사용 가능한 웹 클라이언트를 기반으로 구성할 수 있다. 본 발명의 시스템은 어느 웹 페이지에도 적용이 용이한 형태로 구성되어 있으며, 이를 위해 일예로 팝업 형태의 사용자 인터페이스를 담당하는 Java Script(JS)와 사용자 인터페이스의 디자인을 담당하는 Cascading Style Sheets(CSS) 2가지 파일로 구성될 수 있으며, 이를 통해 사용자가 노출을 원하는 시점에 맞춰 출현하여 사용자의 입력을 받는다. 사용자가 입력한 내용은 크라우드소싱 서버로 전송되어 사용자 신뢰도를 측정하고, 신뢰도가 높고 다수의 사용자로부터 축적된 메타데이터는 학습모델 최적화부에서 사용되어 메타데이터 자동 태깅 성능을 지속적으로 학습, 발전시킬 수 있다.Referring to FIG. 4, it is an example of a pop-up screen to which a JS+CSS package is applied. The metadata crowdsourcing system for constructing high-quality metadata according to an embodiment of the present invention requires a user interface that enables collective intelligence to be realized. Therefore, it can be configured based on web clients that can be used on various platforms such as mobile, PC, and tablet. The system of the present invention is configured in a form that can be easily applied to any web page, for example, Java Script (JS) in charge of pop-up type user interface and Cascading Style Sheets (CSS) in charge of user interface design. It can be composed of two files, through which it appears at the time the user wants to be exposed and receives the user's input. The user input is transmitted to the crowdsourcing server to measure user reliability, and the metadata accumulated from a large number of users with high reliability is used in the learning model optimization unit to continuously learn and develop metadata automatic tagging performance. have.

도 5는 본 발명의 일 실시예에 따른 크라우드 소싱 처리부에서의 처리절차를 설명하기 위한 도면이다.5 is a view for explaining a processing procedure in a crowd sourcing processing unit according to an embodiment of the present invention.

크라우드 소싱 처리부는 문제 출제부, 정답 확인부, 데이터베이스로 구성될 수 있다(도시하지 않음). 도 8을 참조하여 사용자에게 문제를 출제하고 사용자로부터 입력된 데이터를 처리하고, 그 결과를 메타데이터에 반영되는 과정까지의 전체적인 흐름을 설명한다.The crowdsourcing processing unit may include a question questionnaire, a correct answer confirmation unit, and a database (not shown). With reference to FIG. 8, a description will be given of the overall flow until a problem is presented to the user, data input from the user is processed, and the result is reflected in metadata.

먼저 사용자가 서비스에 접속하여 문제를 요청하면(510), 문제 출제부는 사용자에게 문제를 출제한다(520). 문제 출제는 세부적으로 문제 유형 결정, 문제 작성, 정답 셋 작성을 수행한다. 자동 추출된 메타데이터 중 신뢰도가 낮은 데이터들을 랜덤하게 추출하고, 추출된 데이터들을 기반으로 문제 유형을 결정한다. 문제 유형이 결정되면 랜덤한 데이터들 중 문제 유형에 적절한 데이터들만 모아서 사용자에게 표시할 문제를 작성하고, 자동 추출된 메타데이터들 중 신뢰도가 높은 항목들을 문제에 포함시켜 정답 확인부에서 사용자를 얼마나 신뢰할 수 있는지 판단한다.First, when a user accesses the service and requests a problem (510), the problem questionnaire presents a problem to the user (520). In the questionnaire, the detailed question type is determined, the question is written, and the correct answer set is written. Among the automatically extracted metadata, data with low reliability are randomly extracted, and a problem type is determined based on the extracted data. When the problem type is determined, only the data appropriate to the problem type is collected from random data to create a problem to be displayed to the user, and items with high reliability among the automatically extracted metadata are included in the problem, so that the correct answer check unit will trust the user. Determine if you can.

사용자로부터 정답을 입력받은 후(530), 정답 확인부는 정답 처리를 수행한다(540). 일예로, 사용자에게 제시한 문제에 대한 입력 값을 얼마나 신뢰할 수 있는지를 판단하고 데이터베이스의 값을 갱신하는 기능을 수행한다. 정답 확인부는 세부적으로 사용자 신뢰도 분석, 사용자 입력 분석, 데이터베이스 관리 및 갱신을 수행한다. After receiving the correct answer from the user (530), the correct answer checking unit performs the correct answer processing (540). For example, it determines how reliable an input value for a problem presented to a user is and updates the value of the database. The correct answer verification unit performs user reliability analysis, user input analysis, database management and update in detail.

사용자 신뢰도 분석은 문제 출제부에서 작성되어 제시된 문제의 사용자 입력 값 중 높은 신뢰도 항목을 얼마나 정확히 선택했는지의 여부로 판단한다. 예를 들어 낮은 신뢰도 포도사진 2장과 높은 신뢰도의 포도 사진 3장, 높은 신뢰도인 사과 사진 4장을 사용자에게 제시하고 포도가 들어간 사진을 선택하라는 질문을 던졌을 때, 낮은 신뢰도의 포도 사진의 선택 여부와는 상관 없이 사용자가 높은 신뢰도의 포도 사진을 3장 모두 골랐는지, 포도 사진을 누락하고 사과 사진을 선택했는지를 확인하여 사용자의 기존 신뢰도와 합산하여 사용자 신뢰도를 판단한다. 사용자 신뢰도를 계산하는 수식은 일예로 다음과 같다.The user reliability analysis is determined by how accurately the high reliability item was selected among the user input values of the question prepared and presented in the question questionnaire. For example, when a user is presented with 2 low-confidence grape pictures, 3 high-confidence grape pictures, and 4 high-confidence apple pictures and asked to select a picture containing grapes, whether to select a grape picture of low confidence. Regardless of, it is checked whether the user has selected all three pictures of grapes with high reliability, or whether the pictures of grapes are omitted and an apple picture is selected, and is added to the user's existing reliability to determine user reliability. As an example, the formula for calculating the user reliability is as follows.

사용자 신뢰도 = (필수 정답수-오답수)/(필수 정답 수)User confidence = (required correct answers-incorrect answers)/(required correct answers)

사용자 입력 분석은 사용자 신뢰도를 기반으로 사용자가 입력한 값 중 신뢰도가 낮은 항목을 얼마나 정확히 선택했는지를 추정하는 것이다. 높은 신뢰도를 가지는 사용자의 입력은 높은 확률로 실제 정답이라고 추정할 수 있지만, 사용자의 지식부족 및 문제를 정확히 이해하지 못한 상황 등 다양한 이유로 잘못된 입력이 들어올 수 있고, 이는 실제 메타데이터 정보의 신뢰도에 악영향을 끼칠 수 있다. 신뢰도가 낮은 메타데이터의 정확도를 추정하는 수식은 일예로 다음과 같다.User input analysis estimates how accurately an item with low reliability among the values entered by the user is selected based on user reliability. The user's input with high reliability can be estimated to be the actual correct answer with a high probability, but incorrect input may come in for various reasons, such as a lack of knowledge of the user and a situation where the problem is not accurately understood, which adversely affects the reliability of the actual metadata information. Can cause. An equation for estimating the accuracy of metadata with low reliability is as follows.

문제 유형 정확도 = (정답수-오답수)/(사용자가 입력한 문제 유형 전체 개수)Question type accuracy = (number of correct answers-number of incorrect answers)/(total number of question types entered by the user)

메타데이터 정확도 = (기존 정답 개수+해당 문제 정답 개수)/(전체 문제 개수)X문제 유형 정확도X사용자 신뢰도Metadata accuracy = (number of existing correct answers + number of correct answers to the corresponding question)/(number of total questions) X question type accuracy X user reliability

정답 처리 결과 정답인지의 여부를 판단하여(550), 정확하지 않으면 다시 사용자에게 문제 출제를 진행하고, 정확한 경우에는 사용자 신뢰도 점수를 갱신하고 문제 정답을 갱신한다(560). 즉 위와 같이 계산된 사용자 신뢰도와 메타데이터 정확도는 각각 사용자DB, 메타데이터 DB에 기존의 값을 누적하여 저장한다. 그리고 신뢰도 높은 메타데이터를 추추하여 메다데이터 DB에 반영한다(570). 예를 들어 메타데이터는 현재 정확도가 높다고 하더라도 신뢰도가 높은 사용자의 다양한 값들이 누적되지 않으면 자동 추출된 메타데이터를 치환하지 않는다. 최대한 신뢰할 수 있는 상태가 되었을 때 집단지성으로 만들어진 데이터를 사용할 수 있다.As a result of the correct answer processing, it is determined whether or not the answer is correct (550). If it is not correct, the question is asked again to the user, and if it is correct, the user reliability score is updated and the correct answer to the question is updated (560). That is, the user reliability and metadata accuracy calculated as above are accumulated and stored in the user DB and the metadata DB, respectively. Then, metadata with high reliability is extracted and reflected in the metadata DB (570). For example, even though the current accuracy of the metadata is high, the automatically extracted metadata is not replaced unless various values of users with high reliability are accumulated. Data created by collective intelligence can be used when the state is as reliable as possible.

도 6은 본 발명의 일 실시예에 따른 메타데이터 학습모델 최적화부의 상세 구성 및 동작을 설명하기 위한 도면이다.6 is a diagram for describing a detailed configuration and operation of a metadata learning model optimization unit according to an embodiment of the present invention.

집단 지성을 활용하여 다양한 양질의 메타데이터를 구축한 후 새롭게 구축된 데이터를 활용하여 모델을 재생성할 필요가 있다. 그러나 모델을 처음부터 구축하는 것은 많은 시간이 소요되고 사용자 편의성, 시스템 활용도를 떨어뜨린다. 따라서 처음부터 모델을 생성하지 않고 기존의 모델을 활용하며, 이러한 과정들을 자동으로 수행하는 것이 필요하다. 메타데이터 학습모델 최적화부(140)는 비디오 메타데이터 태깅부(110), 크라우드소싱 처리부(120)와 연계되어 이러한 일들을 자동으로 수행하도록 구성되어 있고 구체적으로 전이학습부(610), 액티브 러닝 수행부(620)로 구성되어 있다. After constructing various high-quality metadata using collective intelligence, it is necessary to regenerate the model using newly constructed data. However, building a model from scratch takes a lot of time and reduces user convenience and system utilization. Therefore, it is necessary to use an existing model without creating a model from the beginning, and to perform these processes automatically. The metadata learning model optimization unit 140 is configured to automatically perform these tasks in connection with the video metadata tagging unit 110 and the crowdsourcing processing unit 120, and specifically, the transfer learning unit 610 and active learning are performed. It consists of a part 620.

전이학습부(610)는 객체나 사물에 대한 정답 레이블이 없는 경우 정답이 있는 객체나 사물에서 비슷한 특징들을 추출하여 특정한 객체나 사물로 추정한다. 자동 태깅된 메타데이터의 경우 매우 낮은 정확도를 가질 수 있고, 크라우드 소싱 시스템을 사용한다고 하더라도 정확도가 낮은 객체가 너무 많은 경우에는 처리할 수 있는 양의 한계가 있다. 따라서 전이학습부(610)는 자동 태깅된 메타데이터 중 정확도가 낮은 항목들의 특징을 추출하고, 추출된 특징들이 기 학습된 모델의 어떤 객체나 사물과 비슷한 특징을 가지는지를 파악한다. 기 학습된 모델에서 특징을 비교한 후 유사도가 높은 항목들에 대해서는 해당 객체나 사물로 판단하여 자동 추출된 메타데이터 정보를 갱신하여 추후 비슷한 객체나 사물에 대한 요청이 오면 정확한 답을 제시해 줄 수 있다.The transfer learning unit 610 extracts similar features from the object or object with the correct answer and estimates it as a specific object or object when there is no correct answer label for the object or object. Automatically tagged metadata can have very low accuracy, and even if a crowdsourcing system is used, if there are too many objects with low accuracy, there is a limit to the amount that can be processed. Accordingly, the transfer learning unit 610 extracts features of items with low accuracy among the automatically tagged metadata, and determines which objects or objects of the previously learned model have similar features. After comparing features in a pre-trained model, items with high similarity are determined as objects or objects, and the automatically extracted metadata information is updated to provide an accurate answer when a request for a similar object or thing comes in the future. .

액티브 러닝 수행부(620)는 정답이 모호한 객체나 사물에 대하여 오라클(Oracle)에게 정답을 요구하여 정확한 정답을 가지고 학습 데이터에 사용되도록 한다. 전이학습부(620)는 기 학습된 데이터들과 정답이 모호한 그룹들의 특징을 비교했지만 비슷한 특징을 가지는 객체나 사물이 없는 경우에는 액티브러닝을 활용하여 정답이 모호하지만 비슷한 특징을 가지는 그룹에 대하여 정답 레이블을 작성을 요청한다. 여기서 정답 레이블은 오라클에게 요청하게 되는데 본 발명의 고품질 메타데이터 구축을 위한 크라우드 소싱 시스템에서 오라클은 크라우드 소싱 시스템에서 데이터를 입력하는 개체, 즉 사람이 된다. 정답이 모호하지만 비슷한 특징을 가지는 그룹들은 사람의 손을 거쳐서 정답이 작성되면 마찬가지로 자동 추출된 메타데이터 정보를 갱신하고, 학습 데이터로 활용되어 메타데이터 자동 추출 성능 향상에 기여한다.The active learning execution unit 620 requests the correct answer from Oracle for an object or thing whose correct answer is ambiguous, so that the correct answer is used for learning data. The transfer learning unit 620 compares the previously learned data with the characteristics of groups whose correct answer is ambiguous, but if there is no object or object having similar characteristics, the correct answer is used for groups with ambiguous but similar characteristics using active learning. Ask to write a label. Here, the correct answer label is requested from Oracle. In the crowdsourcing system for constructing high-quality metadata of the present invention, the oracle becomes an entity that inputs data in the crowdsourcing system, that is, a person. Although the correct answer is ambiguous, the groups with similar characteristics update the automatically extracted metadata information when the correct answer is written by human hand, and it is used as learning data to improve the performance of automatic metadata extraction.

도 7은 본 발명의 일 실시예에 따른 메타데이터 크라우드 소싱 방법의 흐름도이다. 7 is a flowchart of a metadata crowdsourcing method according to an embodiment of the present invention.

본 발명의 메타데이터 크라우드 소싱 방법은, 비디오 메타데이터 태깅 단계(710), 크라우드 소싱 처리 단계(720) 및 메타데이터 학습모델 최적화 단계(730)를 포함한다.The metadata crowdsourcing method of the present invention includes a video metadata tagging step 710, a crowdsourcing processing step 720, and a metadata learning model optimization step 730.

비디오 메타데이터 태깅 단계(710)는 비디오 데이터에서 장면 구간별로 이미지를 추출하고 각 이미지에 대한 메타데이터를 추출하여 저장 관리한다. 보다 구체적으로는, 비디오 데이터의 등록, 삭제 및 장면 단위의 비디오 구간 추출을 포함하는 미디어 관리 및 추출된 장면 구간별 이미지에서 객체인식, 배경인식, 얼굴인식 및 캡션생성을 포함하는 메타데이터를 추출을 수행한다.In the video metadata tagging step 710, an image is extracted for each scene section from video data, and metadata for each image is extracted and stored and managed. More specifically, media management including registration and deletion of video data and extraction of video sections in units of scenes, and extraction of metadata including object recognition, background recognition, face recognition, and caption generation from the extracted scene section images Perform.

크라우드 소싱 처리 단계(720)는 이렇게 저장된 이미지와 메타데이터를 가지고 문제를 생성하고 생성된 문제에 대한 답을 사용자로부터 입력받아 사용자의 신뢰도와 문제유형의 정확도를 계산하며, 이를 통해 메타데이터의 정확도를 누적 관리하며 그에 따라 메타데이터를 갱신한다. 여기서 사용자 신뢰도는 제시된 문제에 대한 사용자 입력값 중 높은 신뢰도의 메타데이터의 선택정도에 기초하여 결정되며, 문제유형 정확도는 사용자 입력값 중 낮은 신뢰도의 메타데이터의 선택정도에 기초하여 결정될 수 있다. 또한 메타데이터 정확도는 사용자 신뢰도와 문제유형 정확도에 기존에 누적된 정답의 갯수와 해당 문제의 정답개수를 반영하여 계산될 수 있다.The crowdsourcing processing step 720 creates a problem with the image and metadata stored in this way, receives an answer to the generated problem from the user, and calculates the user's reliability and accuracy of the problem type, thereby determining the accuracy of the metadata. It is cumulatively managed and metadata is updated accordingly. Here, the user reliability is determined based on a selection degree of metadata having a high reliability among user input values for a presented problem, and the problem type accuracy may be determined based on a selection degree of metadata having a low reliability among user input values. In addition, the metadata accuracy may be calculated by reflecting the number of previously accumulated correct answers and the number of correct answers to the corresponding question in the user reliability and problem type accuracy.

메타데이터 학습모델 최적화 단계(730)는 계산된 메타데이터 정확도에 따라 저장된 메타데이터를 갱신하고 최적화한다. 일예로, 전이학습 또는 액티브 러닝을 적용하여 메타데이터를 최적화할 수 있으며, 구체적인 방법은 도 6을 참조하여 전술한 바와 같다.In the metadata learning model optimization step 730, the stored metadata is updated and optimized according to the calculated metadata accuracy. For example, it is possible to optimize metadata by applying transfer learning or active learning, and a specific method is as described above with reference to FIG. 6.

이제까지 본 발명에 대하여 그 실시 예들을 중심으로 살펴보았다. 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been looked at around the embodiments. Those of ordinary skill in the art to which the present invention pertains will be able to understand that the present invention may be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered from an illustrative point of view rather than a limiting point of view. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the scope equivalent thereto should be construed as being included in the present invention.

110 비디오 메타데이터 태깅부 120 크라우드 소싱 처리부
130 서비스 인터페이스 제공부 140 메타데이터 학습모델 최적화부110 Video metadata tagging unit 120 Crowdsourcing processing unit
130 Service interface provision unit 140 Metadata learning model optimization unit

Claims

A video metadata tagging unit that extracts images for each scene section from video data, and extracts and stores metadata for each image;
Generates a problem with the image and metadata stored in the video metadata tagging unit, receives an answer to the generated problem from the user, calculates the user's reliability and accuracy of the problem type, and accumulates and manages the accuracy of the metadata. And a crowd sourcing processing unit for updating metadata accordingly;
A service interface providing unit for presenting the problem generated by the crowdsourcing processing unit to a plurality of unspecified users and receiving an answer to the problem; And
Metadata crowdsourcing system comprising a metadata learning model optimization unit for updating and optimizing metadata stored in the video metadata tagging unit according to the metadata accuracy.

The method of claim 1, wherein the video metadata tagging unit,
Media management including registration and deletion of the video data and extraction of video sections for each scene, and metadata for extracting metadata including object recognition, background recognition, face recognition, and caption generation from the extracted scene section images Crowdsourcing system.

The method of claim 1, wherein the service interface providing unit,
Metadata crowdsourcing system implemented in JavaScript to present problems and input answers through pop-up windows.

The method of claim 3, wherein the service interface providing unit,
Metadata crowdsourcing system in which the design of the user interface is provided in Cascading Style Sheets.

The method of claim 1,
The user reliability is determined based on the selection degree of metadata with high reliability among user input values for the presented problem, and the problem type accuracy is metadata determined based on the selection degree of metadata with low reliability among user input values. Crowdsourcing system.

The method of claim 1,
The metadata accuracy is calculated by reflecting the number of previously accumulated correct answers and the number of correct answers to the corresponding question in the user reliability and the problem type accuracy.

The method of claim 1, wherein the learning model optimization unit,
Metadata crowdsourcing system for optimizing the metadata by applying transfer learning or active learning.

A video metadata tagging step of extracting images for each scene section from the video data, and extracting and storing metadata for each image;
Generates a problem with the stored image and metadata, receives the answer to the generated problem from the user, calculates the user's reliability and accuracy of the problem type, and accumulates and manages the accuracy of metadata through this Updating crowdsourcing processing step; And
And a metadata learning model optimization step of updating and optimizing the stored metadata according to the metadata accuracy.

The method of claim 8, wherein the video metadata tagging step,
Media management including registration and deletion of the video data and extraction of video sections for each scene, and metadata for extracting metadata including object recognition, background recognition, face recognition, and caption generation from the extracted scene section images How to crowdsource.

The method of claim 8,
The user reliability is determined based on a selection degree of metadata having a high reliability among user input values for the generated problem, and the problem type accuracy is determined based on a selection degree of metadata having a low reliability among user input values. Data crowdsourcing method.

The method of claim 8,
The metadata accuracy is calculated by reflecting the number of previously accumulated correct answers and the number of correct answers to the corresponding question in the user reliability and the problem type accuracy.

The method of claim 8, wherein the step of optimizing the learning model comprises:
Metadata crowdsourcing method for optimizing the metadata by applying transfer learning or active learning.