KR102015939B1

KR102015939B1 - Method, apparatus and program for sampling a learning target frame image of video for image learning of artificial intelligence and image learning method thereof

Info

Publication number: KR102015939B1
Application number: KR1020180114986A
Authority: KR
Inventors: 박민우
Original assignee: 주식회사 크라우드웍스
Priority date: 2018-09-27
Filing date: 2018-09-27
Publication date: 2019-08-28
Also published as: WO2020067632A1; US20210241031A1; US11295169B2; JP2020052999A; JP6715358B2

Abstract

Provided are a method, apparatus, and program for sampling a learning target frame image of a video for artificial intelligence image learning, and an image learning method thereof. The method for sampling a learning target frame image of a video for artificial intelligence image learning, which is executed by a computer, comprises: a raw video reception step of receiving a raw video for artificial intelligence image learning; a frame image extraction step of extracting a predetermined number of frame images per a predetermined time section from the received raw video; a learning target object detection step of detecting one or more predetermined types of one or more learning target objects on each frame image by using an object detection algorithm; a background removal step of removing a background except the learning target object on each frame image; an object movement amount measurement step of measuring a movement amount of each learning target object on an n^th (n is a natural number greater than two) frame image from which a background is removed; and a learning target frame image selection step of selecting the n^th frame image as a learning target frame image by comparing a measurement result of a movement amount of each of the one or more learning target objects detected on the n^th frame image with a predetermined reference.

Description

METHOD, APPARATUS AND PROGRAM FOR SAMPLING A LEARNING TARGET FRAME IMAGE OF VIDEO FOR IMAGE LEARNING OF ARTIFICIAL INTELLIGENCE AND IMAGE LEARNING METHOD THEREOF}

본 발명은 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 방법, 그 장치, 프로그램 및 그 영상 학습 방법에 관한 것이다. The present invention relates to a method for sampling a target frame image of a moving image for artificial intelligence image learning, an apparatus, a program, and an image learning method.

인공지능(artificial intelligence, AI)은 기계로부터 만들어진 지능을 의미한다. 인간의 지능으로 할 수 있는 사고, 학습 등을 컴퓨터가 할 수 있도록 하는 방법을 연구하는 컴퓨터 공학 및 정보기술의 한 분야로서, 컴퓨터가 인간의 지능적인 행동을 모방할 수 있도록 하는 것을 인공지능이라고 한다.Artificial intelligence (AI) means intelligence made from a machine. Artificial intelligence is a field of computer science and information technology that studies how to make computers think and learn what human intelligence can do. Artificial intelligence is what makes computers mimic human intelligent behavior. .

인공지능은 꾸준히 연구개발이 이루어지고 있으며, 이미지 지능화에서 음성 및 텍스트 지능화로, 현재는 비디오 영상 지능화에 대한 연구개발이 이루어지고 있어 급속한 성장 중이며, 비디오 영상 지능화의 산업적 파급효과는 매우 크다. Artificial intelligence has been steadily being researched and developed, and from image intelligence to voice and text intelligence, research and development on video image intelligence is being made rapidly.

인공지능 학습을 위한 학습데이터를 만드는 과정에 있어서, 획득한 데이터를 전처리하는 작업은, 학습데이터를 만드는 시간 중 약 70~80%의 시간을 차지한다.In the process of creating learning data for artificial intelligence learning, preprocessing the acquired data takes about 70-80% of the time to create the learning data.

또한, 비디오 영상 데이터의 양은 기존의 이미지 또는 음성 데이터의 양과 비교하여 수십배에서 수백배까지도 차이가 난다.In addition, the amount of video image data varies from several tens to hundreds of times compared to the amount of existing image or audio data.

한국등록특허공보 제10-1888647호, 2018.08.08.Korea Patent Publication No. 10-1888647, 2018.08.08.

비디오 영상 데이터를 이용하여 학습 데이터를 생성하는 것은, 기존의 이미지 또는 음성 데이터를 이용한 학습 데이터 생성과 비교하여 방대한 데이터 용량 때문에 데이터의 전처리 작업에서 시간과 비용이 매우 증가하는 문제가 있다.Generating training data using video image data has a problem in that time and cost are greatly increased in preprocessing of data due to the huge data capacity compared to the generation of training data using conventional image or audio data.

따라서, 본 발명이 해결하고자 하는 과제는 데이터의 전처리 작업에서 시간과 비용을 최소화 할 수 있는 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 방법을 제공하는 것이다.Accordingly, an object of the present invention is to provide a method for sampling a target frame image of a video for artificial intelligence image learning that can minimize time and cost in data preprocessing.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.Problems to be solved by the present invention are not limited to the above-mentioned problems, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

상술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 방법은, 컴퓨터에 의해 수행되는 방법으로서, 인공지능 영상 학습을 위한 원시 동영상을 수신하는, 원시 동영상 수신 단계, 수신한 상기 원시 동영상에서 미리 정해진 시간 구간 당 미리 정해진 개수의 프레임 이미지를 추출하는, 프레임 이미지 추출 단계, 객체 검출 알고리즘을 이용하여 각각의 상기 프레임 이미지 상에서 미리 정해진 하나 이상의 유형의 하나 이상의 학습 대상 객체를 검출하는, 학습 대상 객체 검출 단계, 각각의 상기 프레임 이미지 상에서 상기 학습 대상 객체를 제외한 배경을 제거하는, 배경 제거 단계, 배경이 제거된 제n(n은 2이상의 자연수) 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체의 위치와 상기 제n 프레임 이미지의 직전의 배경이 제거된 제n-1 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체의 위치를 비교하여, 상기 제n 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체 각각의 이동량을 측정하는, 객체의 이동량 측정 단계 및 상기 제n 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체 각각의 이동량의 측정 결과와 미리 정해진 기준을 비교하여 상기 제n 프레임 이미지를 학습 대상 프레임 이미지로 선정하는, 학습 대상 프레임 이미지 선정 단계를 포함한다.According to an embodiment of the present invention for solving the above-described problems, a method for sampling a target image of a video for learning an artificial intelligence image is a method performed by a computer that receives a raw video for artificial intelligence image learning. A frame image extraction step of extracting a predetermined number of frame images per predetermined time interval from the received raw video, and using at least one type of a predetermined type on each frame image using an object detection algorithm. Detecting one or more objects to be learned, learning background objects, removing a background except for the object to be learned on each of the frame images, removing the background, n-th (n is a natural number of two or more) frames from which the background is removed The one or more subjects detected on the image Comparing the detected position of the one or more objects to be learned on the n-th frame image from which the background of the n-th frame image is immediately removed and the detected one or more objects to be learned on the n-th frame image Measuring the movement amount of the object, measuring each movement amount, and comparing the measurement result of the movement amount of each of the detected one or more learning object on the n-th frame image with a predetermined criterion to compare the n-th frame image to the learning object frame image The step of selecting, including the learning object frame image selection step.

상기 학습 대상 프레임 이미지 선정 단계는, 검출된 상기 하나 이상의 학습 대상 객체 중 미리 정해진 개수 이상의 상기 학습 대상 객체의 이동량이 상기 미리 정해진 기준 이상인 경우에는, 상기 제n 프레임 이미지를 상기 학습 대상 프레임 이미지로 선정하고, 그렇지 않은 경우에는, 상기 제n 프레임 이미지를 상기 학습 대상 프레임 이미지로 선정하지 않는다.The selecting of the learning object frame image may include selecting the nth frame image as the learning object frame image when the movement amount of the learning object more than a predetermined number of the detected one or more learning object objects is greater than or equal to the predetermined criterion. Otherwise, the n-th frame image is not selected as the learning target frame image.

상술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 방법은, 선정된 상기 학습 대상 프레임 이미지를 가공 처리하여 학습용 동영상 셋을 생성하는, 학습용 동영상 셋 생성 단계를 더 포함한다.Learning object frame image sampling method of a video for artificial intelligence image learning according to an embodiment of the present invention for solving the above problems, processing the selected learning object frame image to generate a learning video set, The method may further include generating a video set.

상기 학습용 동영상 셋 생성 단계에서 상기 학습 대상 프레임 이미지는, 상기 프레임 이미지 추출 단계에서 상기 하나 이상의 학습 대상 객체가 검출되기 전의 프레임 이미지이고, 상기 학습용 동영상 셋 생성 단계는, 상기 학습 대상 프레임 이미지 상에서 상기 미리 정해진 하나 이상의 유형의 하나 이상의 학습 대상 객체를 검출하는, 객체 검출 단계 및 상기 학습 대상 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체에 대하여 주석화 처리하는, 주석화 처리 단계를 더 포함한다.In the learning video set generation step, the learning object frame image is a frame image before the one or more learning object objects are detected in the frame image extraction step, and the learning video set generation step is performed in advance on the learning object frame image. An object detecting step of detecting one or more types of one or more types of learning objects, and an annotation processing step of annotating the detected one or more learning objects on the learning frame image, are further included.

상기 학습용 동영상 셋 생성 단계에서 상기 학습 대상 프레임 이미지는, 상기 학습 대상 객체 검출 단계에서 상기 하나 이상의 학습 대상 객체가 검출된 프레임 이미지이고, 상기 학습용 동영상 셋 생성 단계는, 상기 학습 대상 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체에 대하여 주석화 처리하는, 주석화 처리 단계를 더 포함한다.In the learning video set generation step, the learning object frame image is a frame image in which the one or more learning object objects are detected in the learning object detection step, and the learning video set generation step is detected on the learning object frame image. The method may further include an annotation processing step of annotating the at least one learning object.

상술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 인공지능 영상 학습 방법은, 컴퓨터에 의해 수행되는 방법으로서, 학습용 동영상 셋을 이용하여 인공지능 영상 학습을 수행하는 단계를 포함하고, 상기 학습용 동영상은, 인공지능 영상 학습을 위한 원시 동영상을 수신하는, 원시 동영상 수신 단계와, 수신한 상기 원시 동영상에서 미리 정해진 시간 구간 당 미리 정해진 개수의 프레임 이미지를 추출하는, 프레임 이미지 추출 단계와, 객체 검출 알고리즘을 이용하여 각각의 상기 프레임 이미지 상에서 미리 정해진 하나 이상의 유형의 하나 이상의 학습 대상 객체를 검출하는, 학습 대상 객체 검출 단계와, 각각의 상기 프레임 이미지 상에서 상기 학습 대상 객체를 제외한 배경을 제거하는, 배경 제거 단계와, 배경이 제거된 제n(n은 2이상의 자연수) 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체의 위치와 상기 제n 프레임 이미지의 직전의 배경이 제거된 제n-1 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체의 위치를 비교하여, 상기 제n 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체 각각의 이동량을 측정하는, 객체의 이동량 측정 단계와, 상기 제n 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체 각각의 이동량의 측정 결과와 미리 정해진 기준을 비교하여 상기 제n 프레임 이미지를 학습 대상 프레임 이미지로 선정하는, 학습 대상 프레임 이미지 선정 단계와, 선정된 상기 학습 대상 프레임 이미지를 가공 처리하여 학습용 동영상 셋을 생성하는, 학습용 동영상 셋 생성 단계를 포함하는 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 방법에 의해서 생성된 것을 특징으로 한다.Artificial intelligence image learning method according to an embodiment of the present invention for solving the above-described problems, a method performed by a computer, comprising the steps of performing artificial intelligence image learning using a set of learning videos, the learning The video includes a raw video receiving step of receiving a raw video for learning an artificial intelligence image, a frame image extracting step of extracting a predetermined number of frame images per predetermined time interval from the received raw video, and object detection. A learning object detection step for detecting one or more types of one or more types of learning objects predetermined on each said frame image using an algorithm, and a background for removing a background except said learning object on each said frame image; The removal step and the background being removed, where n is two or more natural (B) comparing the position of the detected one or more learning object on the frame image with the position of the detected one or more learning object on the n-1th frame image from which the background immediately before the n-th frame image is removed. a moving amount measuring step of measuring an amount of movement of each of the detected one or more learning objects on the n-frame image, a measurement result of a movement amount of each of the detected one or more learning objects on the n-th frame image, and a predetermined criterion Compared to select the n-th frame image as a learning object frame image, the learning object frame image selection step, and processing the selected learning object frame image to generate a learning video set, learning video set generation step Learning target frame of video for artificial intelligence image learning to say The image is generated by an image sampling method.

상술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 장치는, 인공지능 영상 학습을 위한 원시 동영상을 수신하는, 원시 동영상 수신부, 수신한 상기 원시 동영상에서 미리 정해진 시간 구간 당 미리 정해진 개수의 프레임 이미지를 추출하는, 프레임 이미지 추출부, 객체 검출 알고리즘을 이용하여 각각의 상기 프레임 이미지 상에서 미리 정해진 하나 이상의 유형의 하나 이상의 학습 대상 객체를 검출하는, 학습 대상 객체 검출부, 각각의 상기 프레임 이미지 상에서 상기 학습 대상 객체를 제외한 배경을 제거하는, 배경 제거부, 배경이 제거된 제n(n은 2이상의 자연수) 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체의 위치와 상기 제n 프레임 이미지의 직전의 배경이 제거된 제n-1 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체의 위치를 비교하여, 상기 제n 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체 각각의 이동량을 측정하는, 객체의 이동량 측정부 및 상기 제n 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체 각각의 이동량의 측정 결과와 미리 정해진 기준을 비교하여 상기 제n 프레임 이미지를 학습 대상 프레임 이미지로 선정하는, 학습 대상 프레임 이미지 선정부를 포함한다.In accordance with an aspect of the present invention, there is provided an apparatus for sampling a target frame image of a video for learning an artificial intelligence image, including: a raw video receiving unit for receiving a raw video for learning an artificial intelligence image; A frame image extractor, which extracts a predetermined number of frame images per predetermined time interval from a raw video, detects one or more types of one or more types of learning target objects on each of the frame images using an object detection algorithm. A learning object detection unit, a background removal unit for removing a background other than the learning object object on each of the frame images, the background-removed nth (n is a natural number of two or more) detected one or more learning object objects on the frame image The position of and the ship immediately before the n-th frame image The movement amount measuring unit of the object, which compares the detected positions of the one or more objects to be learned on the removed n-th frame image, and measures the amount of movement of each of the detected one or more objects to be learned on the n-th frame image. And a learning object frame image selecting unit configured to select the nth frame image as a learning object frame image by comparing a measurement result of a movement amount of each of the detected one or more learning object objects on the nth frame image with a predetermined criterion. .

상술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 장치는, 선정된 상기 학습 대상 프레임 이미지를 가공 처리하여 학습용 동영상 셋을 생성하는, 학습용 동영상 셋 생성부를 더 포함한다.Learning object frame image sampling apparatus of the video for artificial intelligence image learning according to an embodiment of the present invention for solving the above-described problems, processing the selected learning object frame image to generate a learning video set, The video set generator further includes.

상술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 인공지능 영상 학습 장치는, 학습용 동영상 셋을 이용하여 인공지능 영상 학습을 수행하는, 인공지능 영상 학습 수행부를 포함하고, 상기 학습용 동영상은, 인공지능 영상 학습을 위한 원시 동영상을 수신하는, 원시 동영상 수신부와, 수신한 상기 원시 동영상에서 미리 정해진 시간 구간 당 미리 정해진 개수의 프레임 이미지를 추출하는, 프레임 이미지 추출부와, 객체 검출 알고리즘을 이용하여 각각의 상기 프레임 이미지 상에서 미리 정해진 하나 이상의 유형의 하나 이상의 학습 대상 객체를 검출하는, 학습 대상 객체 검출부와, 각각의 상기 프레임 이미지 상에서 상기 학습 대상 객체를 제외한 배경을 제거하는, 배경 제거부와, 배경이 제거된 제n(n은 2이상의 자연수) 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체의 위치와 상기 제n 프레임 이미지의 직전의 배경이 제거된 제n-1 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체의 위치를 비교하여, 상기 제n 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체 각각의 이동량을 측정하는, 객체의 이동량 측정부와, 상기 제n 프레임 이미지 상의 검출된 상기 하나 이상의 학습 대상 객체 각각의 이동량의 측정 결과와 미리 정해진 기준을 비교하여 상기 제n 프레임 이미지를 학습 대상 프레임 이미지로 선정하는, 학습 대상 프레임 이미지 선정부와, 선정된 상기 학습 대상 프레임 이미지를 가공 처리하여 학습용 동영상 셋을 생성하는, 학습용 동영상 셋 생성부를 포함하는 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 장치에 의해서 생성된 것을 특징으로 한다.Artificial intelligence image learning apparatus according to an embodiment of the present invention for solving the above problems, including an artificial intelligence image learning to perform the artificial intelligence image learning using a learning video set, the learning video, Using a raw image receiving unit for receiving a raw video for AI image learning, a frame image extracting unit for extracting a predetermined number of frame images per predetermined time interval from the received raw video, and an object detection algorithm A learning object detection unit for detecting one or more types of one or more types of learning objects predetermined on each frame image, a background removing unit for removing a background except the learning object on each frame image, and a background Sword on this removed n-th (n is a natural number of two or more) frame image The detected position on the n-th frame image by comparing the detected position of the one or more target objects with the detected position of the one or more target objects on the n-1th frame image from which the background immediately before the n-th frame image is removed. A moving amount measuring unit for measuring a moving amount of each of the learned one or more learning objects, and comparing a measurement result of a moving amount of each of the detected one or more learning objects on the n-th frame image with a predetermined criterion; learning image frame including a learning object frame image selecting unit which selects an n frame image as a learning object frame image and a learning video set generating unit which processes the selected learning object frame image to generate a learning video set; By the frame image sampling device It is characterized by.

상술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 프로그램은, 하드웨어인 컴퓨터를 이용하여, 상술한 방법 중 어느 하나의 방법을 실행하기 위해 기록 매체에 저장된다.According to an embodiment of the present invention for solving the above-mentioned problems, the frame image sampling program for learning video for artificial intelligence image learning is performed by using a computer, which is hardware, to execute any one of the above-described methods. It is stored in the recording medium.

본 발명의 기타 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Other specific details of the invention are included in the detailed description and drawings.

상기 본 발명에 의하면, 동영상의 학습 데이터 생성에 있어서 동영상 데이터 중 일부만을 학습 데이터로 선정함으로써 데이터의 전처리 작업에서 시간 및 비용을 최소화할 수 있다.According to the present invention, it is possible to minimize the time and cost in the preprocessing of the data by selecting only a part of the video data as the training data in generating the training data of the video.

또한, 상기 본 발명에 의하면, 학습 데이터 선정에 있어서 불필요한 데이터는 줄이고, 필요한 데이터만을 선정할 수 있다.In addition, according to the present invention, it is possible to reduce unnecessary data in selecting learning data and to select only necessary data.

또한, 상기 본 발명에 의하면, 학습 대상 객체와 배경을 구분하여 학습 대상 객체의 변화량을 정확하게 측정할 수 있다.In addition, according to the present invention, it is possible to distinguish the learning object and the background to accurately measure the amount of change of the learning object.

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.Effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 본 발명의 일 실시예에 따른 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 방법을 설명하기 위한 순서도이다.
도 2는 본 발명의 학습 대상 객체의 이동량을 측정하기 위한 방법을 설명하기 위한 도면이다.
도 3은 본 발명의 학습 대상 프레임 이미지 선정 방법을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 학습 대상 프레임 이미지 샘플링 과정을 설명하기 위한 도면이다.
도 5는 학습용 동영상 셋을 생성하는 단계를 포함한 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 방법을 설명하기 위한 순서도이다.
도 6은 본 발명의 학습 대상 프레임 이미지가 학습 대상 객체가 검출되기 전의 프레임 이미지인 경우 학습용 동영상 셋을 생성하는 단계를 설명하기 위한 순서도이다.1 is a flowchart illustrating a method of sampling a target frame image of a video for learning an artificial intelligence image according to an embodiment of the present invention.
2 is a view for explaining a method for measuring the amount of movement of a learning object of the present invention.
3 is a view for explaining a method for selecting a learning target frame image of the present invention.
4 is a diagram for describing a learning object frame image sampling process according to an embodiment of the present invention.
FIG. 5 is a flowchart illustrating a method for sampling a learning target frame image of a video for artificial intelligence image learning including generating a learning video set.
FIG. 6 is a flowchart illustrating an operation of generating a training video set when the learning target frame image is a frame image before the learning target object is detected.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Advantages and features of the present invention and methods for achieving them will be apparent with reference to the embodiments described below in detail with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but can be embodied in various different forms, and the present embodiments only make the disclosure of the present invention complete, and those of ordinary skill in the art to which the present invention belongs. It is provided to fully inform the skilled worker of the scope of the invention, which is defined only by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, "comprises" and / or "comprising" does not exclude the presence or addition of one or more other components in addition to the mentioned components. Like reference numerals refer to like elements throughout, and "and / or" includes each and all combinations of one or more of the mentioned components. Although "first", "second", etc. are used to describe various components, these components are of course not limited by these terms. These terms are only used to distinguish one component from another. Therefore, of course, the first component mentioned below may be a second component within the technical spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms used in the present specification (including technical and scientific terms) may be used in a sense that can be commonly understood by those skilled in the art. In addition, terms that are defined in a commonly used dictionary are not ideally or excessively interpreted unless they are specifically defined clearly.

공간적으로 상대적인 용어인 "아래(below)", "아래(beneath)", "하부(lower)", "위(above)", "상부(upper)" 등은 도면에 도시되어 있는 바와 같이 하나의 구성요소와 다른 구성요소들과의 상관관계를 용이하게 기술하기 위해 사용될 수 있다. 공간적으로 상대적인 용어는 도면에 도시되어 있는 방향에 더하여 사용시 또는 동작 시 구성요소들의 서로 다른 방향을 포함하는 용어로 이해되어야 한다. 예를 들어, 도면에 도시되어 있는 구성요소를 뒤집을 경우, 다른 구성요소의 "아래(below)"또는 "아래(beneath)"로 기술된 구성요소는 다른 구성요소의 "위(above)"에 놓여질 수 있다. 따라서, 예시적인 용어인 "아래"는 아래와 위의 방향을 모두 포함할 수 있다. 구성요소는 다른 방향으로도 배향될 수 있으며, 이에 따라 공간적으로 상대적인 용어들은 배향에 따라 해석될 수 있다.The spatially relative terms " below ", " beneath ", " lower ", " above ", " upper " It can be used to easily describe a component's correlation with other components. Spatially relative terms are to be understood as including terms in different directions of components in use or operation in addition to the directions shown in the figures. For example, when flipping a component shown in the drawing, a component described as "below" or "beneath" of another component may be placed "above" the other component. Can be. Thus, the exemplary term "below" can encompass both an orientation of above and below. Components may be oriented in other directions as well, so spatially relative terms may be interpreted according to orientation.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세하게 설명한다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 방법을 설명하기 위한 순서도이다.1 is a flowchart illustrating a method of sampling a target frame image of a video for learning an artificial intelligence image according to an embodiment of the present invention.

도 1을 참고하면, 본 발명의 일 실시예에 따른 동영상의 학습 대상 프레임 이미지 샘플링 방법은, 인공지능 영상 학습을 위한 원시 동영상을 수신하는, 원시 동영상 수신 단계(S100), 수신한 원시 동영상에서 미리 정해진 시간 구간 당 미리 정해진 개수의 프레임 이미지를 추출하는, 프레임 이미지 추출 단계(S200), 각각의 프레임 이미지 상에서 미리 정해진 하나 이상의 유형의 하나 이상의 학습 대상 객체를 검출하는, 학습 대상 객체 검출 단계(S300), 각각의 프레임 이미지 상에서 학습 대상 객체를 제외한 배경을 제거하는, 배경 제거 단계(S400), 제n 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체 각각의 이동량을 측정하는, 학습 대상 객체 이동량 측정 단계(S500) 및 하나 이상의 학습 대상 객체 각각의 이동량의 측정 결과와 미리 정해진 기준을 비교하여 제n 프레임 이미지를 학습 대상 프레임 이미지로 선정하는, 학습 대상 프레임 이미지 선정 단계(S600)를 포함한다.Referring to FIG. 1, in the method of sampling a target image of a video according to an embodiment of the present invention, a raw video receiving step (S100) of receiving a raw video for learning an artificial intelligence image, in advance in a received raw video Frame image extraction step (S200) for extracting a predetermined number of frame images per predetermined time period, learning object detection step (S300) for detecting one or more types of learning object of at least one type predetermined on each frame image The background removal step (S400) of removing the background except for the learning target object on each frame image, and the measurement of the movement amount of each of the detected one or more learning target objects on the n-th frame image, learning object movement amount measuring step (S500) And the measurement result of the movement amount of each of the one or more objects to be learned and a predetermined criterion. And it includes the n-th selection learning target frame image, for selecting the image frame to the learning target frame image of step (S600).

원시 동영상 수신 단계(S100)에서 원시 동영상은 각종 카메라 등에서 수집되는 동영상을 포함한다.In the raw video receiving step (S100), the raw video includes a video collected by various cameras.

일 실시예로, 자동차의 자율 주행을 위한 인공지능 영상 학습 데이터를 생성하기 위해 원시 동영상을 수신하는 경우, 원시 동영상은, 차량용 블랙박스 영상, 자동차가 다닐 수 있는 도로를 비추고 있는 CCTV 영상 또는 자율 주행을 위한 학습 데이터 생성을 위하여 카메라를 부착한 자동차로부터 획득한 동영상 등, 자율 주행을 위한 학습 데이터 생성이 가능한 동영상이 모두 포함되며, 상기 예에 한정되지 않는다.In one embodiment, when the raw video is received to generate artificial intelligence image learning data for autonomous driving of the car, the raw video is a vehicle black box image, a CCTV image illuminating the road that the car can carry or autonomous driving All the videos capable of generating learning data for autonomous driving, such as a video obtained from a car with a camera for generating learning data for the above, are not limited to the above examples.

다른 실시예로, 병변 또는 질환을 진단하기 위한 인공지능 영상 학습 데이터를 생성하기 위해 원시 동영상을 수신하는 경우, 원시 동영상은, 각종 영상 의료 기기에 의해 획득된 동영상을 포함하고, 예를 들어, CT(Computer tomography) 영상, 핵자기공명 컴퓨터 단층촬영 영상(Nuclear Magnetic Resonance Computed Tomography, NMR-CT), 양전자 단층촬영 영상(positron emission tomography; PET), CBCT(conebeamCT), 전자빔 단층촬영 영상(Electron beam tomography), 엑스레이(X-ray), 자기공명영상(margnetic resonance imaging) 등이 포함될 수 있으나, 영상 의료 기기에 의해 획득된 동영상은 모두 포함되며, 상기 예에 한정되지 않는다.In another embodiment, when the raw video is received to generate artificial intelligence image learning data for diagnosing a lesion or disease, the raw video includes a video obtained by various imaging medical devices, for example, CT (Computer tomography) images, Nuclear Magnetic Resonance Computed Tomography (NMR-CT), positron emission tomography (PET), CBCT (conebeamCT), electron beam tomography ), X-rays (X-rays), magnetic resonance imaging (margnetic resonance imaging), and the like, but may include all videos obtained by the imaging medical device, but is not limited to the above examples.

또 다른 실시예로, 범죄 현장을 감지하기 위한 인공지능 영상 학습 데이터를 생성하기 위해 원시 동영상을 수신하는 경우, 원시 동영상은, 공공으로 설치된 CCTV 및 개인이 설치한 CCTV 등에 의해 획득된 동영상을 포함한다.In another embodiment, when the raw video is received to generate artificial intelligence image learning data for detecting a crime scene, the raw video includes a video obtained by a publicly installed CCTV, a personally installed CCTV, or the like. .

프레임 이미지 추출 단계(S200)는, 수신한 원시 동영상에서 컴퓨터 또는 사용자의 설정에 의해 정해진 개수의 프레임 이미지를 추출한다.In the frame image extracting step (S200), a predetermined number of frame images are extracted from the received raw video by a computer or a user.

수신한 원시 동영상에서 정해진 개수의 프레임 이미지를 추출하는 것은, 미리 정해진 시간의 구간에서 미리 정해진 개수의 프레임을 추출하는 것으로서, 예를 들어, 컴퓨터는 초당 30프레임 또는 초당 60프레임을 추출할 수 있으며, 사용자 또는 컴퓨터에 의해 미리 정해진 기준에 의해 프레임을 추출하는 것은 모두 포함되며, 상기 예에 한정되지 않는다.Extracting a predetermined number of frame images from the received raw video is extracting a predetermined number of frames in a predetermined time interval. For example, the computer may extract 30 frames per second or 60 frames per second. Extracting a frame based on a predetermined criterion by a user or a computer is all included, but is not limited to the above examples.

학습 대상 객체 검출 단계(S300)는, 추출된 프레임 이미지 각각에 대하여 학습 대상 객체를 검출하는 것으로서, 추출된 프레임 이미지 각각에서 하나 이상의 학습 대상 객체를 검출하고, 학습 대상 객체는 하나 이상의 유형을 포함한다.Learning object detection step (S300), to detect the learning object for each of the extracted frame image, detects one or more learning object in each of the extracted frame image, the learning object includes one or more types .

학습 대상 객체의 종류로는, 예를 들어, 사람, 자동차, 자전거, 건물, 전봇대, 오토바이, 나무, 꽃, 강아지, 고양이, 도로, 교통 표지판, 과속 방지턱, 교통용 콘, 차선 등을 포함하며, 상기 예에 한정되지 않고, 객체로서 구별이 가능한 것들을 모두 포함한다. Types of objects to be studied include, for example, people, cars, bicycles, buildings, power poles, motorcycles, trees, flowers, dogs, cats, roads, traffic signs, speed bumps, traffic cones, lanes, etc. It is not limited to the above example, but includes all distinguishable things as objects.

각 학습 대상 객체의 유형으로는, 예를 들어, 전면, 후면, 우측면, 좌측면 등을 포함하고, 각 학습 대상 객체의 유형은 상기 예에 한정되지 않고, 상기 예보다 세분화시켜 구분할 수 있으며, 상기 예와는 전혀 다른 유형으로서 구분할 수도 있다.The type of each object to be learned includes, for example, a front, a back, a right side, a left side, and the like, and the type of each object to be learned is not limited to the above examples, and may be classified in more detail than the above examples. You can distinguish it as a completely different type from the example.

학습 대상 객체 검출로서 하나 이상의 유형의 하나 이상의 객체를 검출하는 것은, 객체 검출 알고리즘을 이용하여 검출하는 것이고, 객체 검출 알고리즘은 예를 들어, R-CNN 모델을 포함한다.Detecting one or more objects of one or more types as learning object detection is detecting using an object detection algorithm, the object detection algorithm comprising, for example, an R-CNN model.

배경 제거 단계(S400)는, 추출한 프레임 이미지 상에서 검출된 학습 대상 객체를 제외한 것들을 배경으로 처리하고, 배경 부분을 모두 제거하는 것이다.Background removal step (S400) is to process the background, except for the object to be learned detected on the extracted frame image, and remove all background parts.

프레임 이미지 상에서 배경을 제거하는 방법으로서는, 일 실시예로 배경에 해당 되는 영역을 0 또는 1로 처리하여 제거한다. As a method of removing a background on a frame image, in one embodiment, a region corresponding to a background is treated as 0 or 1 and removed.

학습 대상 객체 이동량 측정 단계(S500)는, 배경이 제거된 제n(n은 2이상의 자연수) 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체의 위치와 제n 프레임 이미지의 직전의 배경이 제거된 제n-1 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체의 위치를 비교하여, 제n 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체 각각의 이동량을 측정하는 단계이다.The learning object movement amount measuring step (S500) may include: n-th (n is a natural number of 2 or more) where the background is removed and n-th is removed from the position of the detected one or more learning object objects on the n-th frame image. Comparing the position of the detected one or more learning object on the -1 frame image, measuring the amount of movement of each of the detected one or more learning object on the n-th frame image.

학습 대상 객체 이동량 측정에 관한 구체적인 실시예는 도 2에서 후술한다.A detailed embodiment of measuring the movement amount of the object to be learned will be described later with reference to FIG. 2.

학습 대상 프레임 이미지 선정 단계(S600)는, 제n 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체 각각의 이동량의 측정 결과와 미리 정해진 기준을 비교하여 제n 프레임 이미지를 학습 대상 프레임 이미지로 선정한다.In the learning target frame image selection step (S600), the n-th frame image is selected as the learning target frame image by comparing a measurement result of a movement amount of each of the detected one or more learning target objects on the n-th frame image with a predetermined criterion.

학습 대상 프레임 이미지 선정에 관한 구체적인 방법은 도 3 및 도 4에서 후술한다.A detailed method for selecting a learning target frame image will be described later with reference to FIGS. 3 and 4.

도 2는 본 발명의 학습 대상 객체의 이동량을 측정하기 위한 방법을 설명하기 위한 도면이다.2 is a view for explaining a method for measuring the amount of movement of a learning object of the present invention.

도 2를 참고하여, 학습 대상 객체 이동량 측정 단계(S500)에서의 이동량 측정 방법에 대하여 설명한다.Referring to FIG. 2, the moving amount measuring method in the learning object moving amount measuring step S500 will be described.

도 2의 (a)는 제n-1 프레임 이미지(11), 도 2의 (b)는 제n 프레임 이미지(12)를 도시하고 있다.FIG. 2A illustrates the n-th frame image 11 and FIG. 2B illustrates the n-th frame image 12.

학습 대상 객체의 이동량 측정은 제n-1 프레임 이미지(11) 상에서의 학습 대상 객체(21)와 제n 프레임 이미지(12) 상에서의 학습 대상 객체(22)의 위치를 비교하는 것이다.The movement amount measurement of the learning object is to compare the positions of the learning object 21 on the n-th frame image 11 and the learning object 22 on the n-th frame image 11.

제n-1 프레임 이미지(11) 상에서의 학습 대상 객체(21)와 제n 프레임 이미지(12) 상에서의 학습 대상 객체(22)는 동일한 형태의 객체로서, 학습 대상 객체의 동일한 위치에 해당하는 부분을 먼저 선정한다.The object to be learned 21 on the n-th frame image 11 and the object to be learned 22 on the n-th frame image 12 are objects of the same type and correspond to portions of the same object of the object to be learned. Select first.

학습 대상 객체의 동일한 위치에 해당하는 부분의 선정은, 컴퓨터가 제n-1 프레임 이미지(11) 상의 학습 대상 객체(21)에서 특정 부분을 A로 선정하였다고 할 때, 제n 프레임 이미지(12) 상의 학습 대상 객체(22)상에서 A와 동일한 위치에 해당하는 부분을 A'로 선정한다.Selection of the part corresponding to the same position of the object to be learned is based on the n-th frame image 12 when the computer selects a specific part A from the object to be learned 21 on the n-th frame image 11. A portion corresponding to the same position as A on the learning object 22 on the image is selected as A '.

컴퓨터는 학습 대상 객체의 동일한 위치에 해당하는 부분을 선정한 후, 제n-1 프레임 이미지(11)와 제n 프레임 이미지(12)가 동일한 평면 상에 놓은 후, A 및 A'에 대한 좌표를 추출한다.After selecting the part corresponding to the same position of the object to be learned, the computer extracts the coordinates for A and A 'after the n-th frame image 11 and the n-th frame image 12 are placed on the same plane. do.

컴퓨터는 A 및 A'에 대한 좌표를 추출한 후, A좌표 및 A'좌표의 차이를 이용하여 이동량을 측정한다.The computer extracts the coordinates for A and A 'and then measures the amount of movement using the difference between the A and A' coordinates.

도 3은 본 발명의 학습 대상 프레임 이미지 선정 방법을 설명하기 위한 도면이다.3 is a view for explaining a method for selecting a learning target frame image of the present invention.

도 3을 참고하면, 학습 대상 프레임 이미지 선정 방법은, 검출된 하나 이상의 학습 대상 객체 중 미리 정해진 개수 이상의 학습 대상 객체의 이동량이 미리 정해진 기준 이상인지(S610)를 판단하여, 검출된 하나 이상의 학습 대상 객체 중 미리 정해진 개수 이상의 학습 대상 객체의 이동량이 미리 정해진 기준 이상인 경우에는, 제n 프레임 이미지를 학습 대상 프레임 이미지로 선정(S611)하고, 검출된 하나 이상의 학습 대상 객체 중 미리 정해진 개수 이상의 학습 대상 객체의 이동량이 미리 정해진 기준 이상에 해당되지 않는 경우에는 제n 프레임 이미지를 학습 대상 프레임 이미지로 선정하지 않는 것이다(S612).Referring to FIG. 3, the method for selecting a learning target frame image may determine whether a movement amount of a learning target object of a predetermined number or more among the detected one or more learning target objects is greater than or equal to a predetermined criterion (S610), and the detected one or more learning targets. If the movement amount of the learning object more than a predetermined number of objects is greater than or equal to a predetermined criterion, the n-th frame image is selected as the learning object frame image (S611), and among the detected one or more learning object objects, the learning object is greater than or equal to the predetermined number. If the amount of movement does not correspond to the predetermined reference or more, the n-th frame image is not selected as the learning target frame image (S612).

학습 대상 객체의 이동량의 기준을 설정하여, 미리 정해진 기준 이상인 것에 해당되는 프레임 이미지만, 학습 대상 프레임 이미지로서 선정함으로써, 불필요한 데이터는 줄이고, 필요한 데이터만을 선정할 수 있다.By setting a reference for the movement amount of the learning target object and selecting only the frame image corresponding to the predetermined reference or more as the learning target frame image, unnecessary data can be reduced and only necessary data can be selected.

따라서, 학습 대상 프레임 이미지는, 추출된 프레임 이미지 모두가 학습 대상 프레임 이미지로서 선정되지 않고, 객체의 이동량이 적어 학습에 있어 영향이 크지 않은 데이터들을 제외하고, 일부만이 학습 대상 프레임 이미지로서 선정됨으로써, 학습 데이터 셋은, 데이터의 양이 방대함에도 빠르고 정확하게 생산될 수 있다.Therefore, in the learning frame image, all of the extracted frame images are not selected as the learning object frame image, and only some of the extracted frame images are selected as the learning object frame image, except for data in which the movement amount of the object is small and does not influence the learning. The training data set can be produced quickly and accurately even with a large amount of data.

또한, 나아가, 학습 데이터 셋은 불필요한 데이터를 제거하여 생산됨으로써, 학습에 소요되는 시간도 줄일 수 있다.Furthermore, the learning data set is produced by removing unnecessary data, thereby reducing the time required for learning.

도 4는 본 발명의 일 실시예에 따른 학습 대상 프레임 이미지 샘플링 과정을 설명하기 위한 도면이다.4 is a diagram for describing a learning object frame image sampling process according to an embodiment of the present invention.

도 4를 참고하면, 도 4의 (a)는 프레임 이미지 추출 단계(S200)에 의해 추출된 프레임 이미지(10)를 도시하고 있으며, 프레임 이미지(10)는 학습 대상 객체(20) 및 배경(30)을 포함한다.Referring to FIG. 4, FIG. 4A illustrates a frame image 10 extracted by the frame image extraction step S200, and the frame image 10 includes the learning object 20 and the background 30. ).

도 4의 (b)는 도 4의 (a)인 프레임 이미지(10)에서, 배경 제거 단계(S400)에 의해 배경(30)이 제거된 것을 나타낸 도면으로, 프레임 이미지(10)는 학습 대상 객체(20)만을 포함한다.FIG. 4B is a view showing that the background 30 is removed by the background removing step S400 in the frame image 10 of FIG. 4A, and the frame image 10 is an object to be learned. Includes only 20.

도 4의 (c)는 제n(n은 2이상의 자연수) 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체(21)의 위치와 제n 프레임 이미지의 직전의 배경이 제거된 제n-1 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체(22)의 위치를 비교하는 것을 나타낸 도면이다.FIG. 4C shows the position of the detected one or more learning objects 21 on the n-th (n is a natural number of 2 or more) frame image and the n-th frame image on which the background immediately before the n-th frame image is removed. A diagram illustrating a comparison of the positions of the detected one or more learning objects 22.

컴퓨터는 각각의 학습 대상 객체(21, 22)간의 비교를 통하여, 학습 대상 객체의 이동량을 측정할 수 있다.The computer may measure the amount of movement of the object to be learned through comparison between the objects to be learned 21 and 22.

학습 대상 객체(21, 22)의 위치 비교를 통하여, 이동량을 측정한 후, 컴퓨터는 검출된 하나 이상의 학습 대상 객체(20) 중 미리 정해진 개수 이상의 학습 대상 객체의 이동량이 미리 정해진 기준 이상인 경우에는, 제n 프레임 이미지를 상기 학습 대상 프레임 이미지로 선정하고, 그렇지 않은 경우에는, 제n 프레임 이미지를 학습 대상 프레임 이미지로 선정하지 않는다.After measuring the movement amount by comparing the positions of the learning object objects 21 and 22, when the movement amount of the learning object more than a predetermined number of the one or more learning object objects 20 detected is greater than or equal to a predetermined criterion, The n-th frame image is selected as the learning object frame image, otherwise, the n-th frame image is not selected as the learning object frame image.

일 실시예로, 컴퓨터는 제1 프레임 이미지 상에서 검출된 학습 대상 객체와 제2 프레임 이미지 상에서 검출된 학습 대상 객체의 이동량을 비교하여 이동량이 미리 정해진 기준 이상인 경우에는, 제2 프레임 이미지를 학습 대상 프레임 이미지로 선정한다. 이동량이 미리 정해진 기준 이상에 해당되지 않는 경우에는, 제2 프레임 이미지를 학습 대상 프레임 이미지로 선정하지 않는다.In an embodiment, the computer may compare the movement amount of the learning object detected on the first frame image with the learning object detected on the second frame image, and if the movement amount is greater than or equal to a predetermined reference value, the computer may convert the second frame image into the learning object frame. Select by image. If the movement amount does not correspond to the predetermined reference or more, the second frame image is not selected as the learning target frame image.

학습 대상 프레임 이미지를 선정하는 단계는, 선정된 추출된 프레임 이미지를 모두 비교하여 실시하는 것이다. The step of selecting the learning frame image is performed by comparing all the selected extracted frame images.

따라서, 컴퓨터는 제2 프레임 이미지를 학습 대상 프레임 이미지로 선정 또는 선정하지 않은 것에 그치지 않고, 다시 제2 프레임 이미지 상에서 검출된 학습 대상 객체와 제3 프레임 이미지 상에서 검출된 학습 대상 객체의 이동량을 비교하여, 이동량이 미리 정해진 기준 이상인 경우에는, 제3 프레임 이미지를 학습 대상 프레임 이미지로 선정한다. Accordingly, the computer is not limited to selecting or not selecting the second frame image as the learning target frame image, and comparing the movement amount of the learning object detected on the second frame image with the learning object detected on the third frame image. If the movement amount is equal to or greater than a predetermined reference, the third frame image is selected as the learning target frame image.

프레임 이미지 상의 객체의 이동량을 측정하고 학습 대상 프레임 이미지로 선정 또는 선정하지 않는 단계는, 제n(n은 2이상의 자연수) 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체와 제n-1 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체의 위치를 비교하는 것으로서, 추출된 모든 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체에 대하여 비교하여 학습 대상 프레임으로 선정 또는 선정하지 않는 것이 완료될 때까지 반복한다.The step of measuring the movement amount of the object on the frame image and not selecting or selecting as the learning object frame image may include detecting one or more detected object to be learned on the nth (n is a natural number of two or more) frame images and the n-1th frame image. By comparing the positions of one or more learning target objects, the selected one or more learning target objects on all the extracted frame images are compared and repeated until selection or non-selection of the learning target frame is completed.

또한, 학습 대상 객체(20)는 하나 이상으로서, 컴퓨터는 하나의 프레임 이미지(10) 상에 복수의 학습 대상 객체(20)가 있는 경우에도, 각각의 학습 대상 객체(20)를 비교하여 이동량을 측정한다.In addition, the learning target object 20 is one or more, and the computer compares each learning target object 20 even when there are a plurality of learning target objects 20 on one frame image 10. Measure

하나의 프레임 이미지(10) 상에 복수의 학습 대상 객체(20)를 포함하는 경우에는, 일 실시예로, 컴퓨터는 복수의 학습 대상 객체(20) 전부의 이동량을 측정하여, 미리 정해진 개수의 학습 대상 객체(20)의 이동량이 미리 정해진 기준 이상인 경우 해당 프레임 이미지(10)를 학습 대상 프레임 이미지로 선정한다.When the plurality of learning objects 20 are included on one frame image 10, in one embodiment, the computer measures a movement amount of all of the plurality of learning objects 20, so that a predetermined number of learnings are performed. When the moving amount of the target object 20 is greater than or equal to a predetermined reference, the corresponding frame image 10 is selected as the learning target frame image.

다른 실시예로, 컴퓨터는 복수의 학습 대상 객체(20) 전부의 이동량을 측정하여, 복수의 학습 대상 객체(20) 전부의 이동량이 미리 정해진 기준 이상인 경우 해당 프레임 이미지(10)를 학습 대상 프레임 이미지로 선정한다.In another embodiment, the computer measures the movement amount of all of the plurality of learning objects 20, and when the movement amount of all of the plurality of learning objects 20 is equal to or greater than a predetermined reference, the corresponding frame image 10 is a learning object frame image. To be selected.

또 다른 실시예로, 컴퓨터는 복수의 학습 대상 객체(20) 중 미리 정해진 개수의 학습 대상 객체(20)의 이동량만을 측정하여, 측정한 학습 대상 객체(20)의 이동량 중 미리 정해진 개수의 학습 대상 객체(20)의 이동량이 미리 정해진 기준 이상인 경우 해당 프레임 이미지(10)를 학습 대상 프레임 이미지로 선정한다.In another embodiment, the computer measures only the movement amount of the predetermined number of learning object objects 20 among the plurality of learning object objects 20, and the predetermined number of learning objects of the movement amount of the learning object object 20 measured. When the moving amount of the object 20 is greater than or equal to a predetermined reference, the corresponding frame image 10 is selected as the learning target frame image.

또 다른 실시예로, 컴퓨터는 복수의 학습 대상 객체(20) 중 미리 정해진 개수의 학습 대상 객체(20)의 이동량만을 측정하여, 측정한 학습 대상 객체(20) 전부의 이동량이 미리 정해진 기준 이상인 경우 해당 프레임 이미지(10)를 학습 대상 프레임 이미지로 선정한다.In another embodiment, when the computer measures only the movement amount of the predetermined number of learning object objects 20 among the plurality of learning object objects 20, the movement amount of all the measured learning object objects 20 is greater than or equal to a predetermined criterion. The frame image 10 is selected as the learning target frame image.

도 4의 (c)의 이동량 측정에 의해 선정된 학습 대상 프레임 이미지는, 도 4의 (d)와 같이, 추출된 프레임 이미지(11, 12, 13, 14, 15) 중에서 선정된 학습 대상 프레임 이미지(12, 14)로서 선정된다.The learning object frame image selected by the movement amount measurement of FIG. 4C is selected from the extracted frame images 11, 12, 13, 14, and 15 as shown in FIG. 4D. It is selected as (12, 14).

도 5는 학습용 동영상 셋을 생성하는 단계를 포함한 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 방법을 설명하기 위한 순서도이다.FIG. 5 is a flowchart illustrating a method for sampling a learning target frame image of a video for artificial intelligence image learning including generating a learning video set.

도 6은 본 발명의 학습 대상 프레임 이미지가 학습 대상 객체가 검출되기 전의 프레임 이미지인 경우 학습용 동영상 셋을 생성하는 단계를 설명하기 위한 순서도이다.FIG. 6 is a flowchart illustrating an operation of generating a training video set when the learning target frame image is a frame image before the learning target object is detected.

도 5를 참고하면, 본 발명의 동영상의 학습 대상 프레임 이미지 샘플링 방법은 선정된 학습 대상 프레임 이미지를 가공 처리하여 학습용 동영상 셋을 생성하는 단계(S700)를 더 포함한다.Referring to FIG. 5, the method for sampling a learning target frame image of a video according to the present invention further includes generating a learning video set by processing the selected learning target frame image (S700).

선정된 학습 대상 프레임 이미지를 가공 처리하여 학습용 동영상 셋을 생성하는 단계(S700)에서, 학습 대상 프레임 이미지는, 학습 대상 객체가 검출되기 전의 프레임 이미지 또는 학습 대상 객체가 검출된 프레임 이미지를 포함한다.In operation S700 of processing the selected learning object frame image to generate a learning video set, the learning object frame image includes a frame image before the learning object object is detected or a frame image from which the learning object object is detected.

도 6을 참고하면, 학습 대상 프레임 이미지가 학습 대상 객체가 검출되기 전의 프레임 이미지인 경우 학습용 동영상 셋을 생성하는 단계(S700)는, 학습 대상 프레임 이미지 상에서 미리 정해진 하나 이상의 유형의 하나 이상의 학습 대상 객체를 검출하는 단계(S710) 및 학습 대상 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체에 대하여 주석화 처리하는 단계(S720)를 포함한다.Referring to FIG. 6, when the learning target frame image is a frame image before the learning target object is detected (S700), the learning video set may be generated in operation S700. Detecting (S710) and annotating the detected one or more learning target objects on the learning target frame image (S720).

학습 대상 프레임 이미지 상에서 미리 정해진 하나 이상의 유형의 하나 이상의 학습 대상 객체를 검출하는 단계(S710)는, 상술한 도 1의 학습 대상 객체를 검출하는 단계(S300)와 동일하게 적용된다.Detecting one or more types of one or more types of learning target objects on the learning target frame image (S710) is the same as detecting the learning object of FIG. 1 (S300).

학습 대상 객체가 검출되기 전의 프레임 이미지인 학습 대상 프레임 이미지가, 학습 대상 객체를 검출하는 단계를 거치게 되는 것은, 이후 진행될 학습 대상 프레임 이미지 상의 학습 대상 객체에 대하여 주석화를 처리하고, 이를 이용하여 학습용 동영상 셋을 생성 단계를 위한 것이다.When the learning object frame image, which is a frame image before the learning object object is detected, goes through the step of detecting the learning object, annotating the learning object on the learning object frame image to be processed later, and using the same, for learning This is for creating a video set.

학습 대상 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체에 대하여 주석화 처리하는 단계(S720)에서, 주석화 처리는, 라벨링(labeling), 컬러링(coloring) 또는 레이어링(layering)을 포함하고, 학습 대상 객체가 무엇인지 표시하는 것은 모두 주석화 처리로서 포함될 수 있다.In annotating (S720) the detected one or more learning object on the learning frame image (S720), the annotation processing includes labeling, coloring, or layering, and the learning object. Anything indicating what can be included can be included as annotation processing.

학습 대상 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체는, 예를 들어, 박스(box)등의 형태로서, 영역이 구분되도록 표시되어 있다.The detected one or more learning objects on the learning object frame image are, for example, in the form of a box or the like, and are marked to distinguish areas.

따라서, 학습 대상 객체로서 구분되어 표시된 영역에 대하여, 각 객체가 무엇인지 설명이 작성될 수 있으며, 라벨링(labeling)은, 간단하게는 하나의 단어로 작성될 수 있고, 하나의 단어가 아닌, 문장으로서도 자세하게 작성될 수 있다. Thus, descriptions can be made of what each object is about the areas marked and displayed as the object to be learned, and labeling can be simply written in one word, and not in one word. It can also be written in detail as.

하나 이상의 사용자가 직접 라벨링을 할 수 있으며, 컴퓨터가 사용자로부터 라벨링 명령을 받아 입력할 수 있다. One or more users may label themselves and the computer may enter labeling instructions from the user.

사용자가 라벨링을 하는 경우에, 라벨링의 방법으로는, 직접 설명을 작성하거나 복수의 탭 중에서 선택할 수도 있다.In the case of the user labeling, as a method of labeling, the description may be written directly or selected from a plurality of tabs.

또한, 컴퓨터가 학습에 의하여 이전에 라벨링 되었던 객체와 현재 라벨링 대상 객체가 동일하다고 판단되는 경우에는, 컴퓨터는 현재 라벨링 대상 객체에 대하여 이전 객체와 동일하게 라벨링 할 수 있다.In addition, when the computer determines that the object previously labeled by the learning and the current labeling object are the same, the computer may label the current labeling object the same as the previous object.

컬러링(coloring) 또는 레이어링(layering)을 통해 주석화 처리하는 경우에도, 라벨링과 동일하게 하나 이상의 사용자가 직접 컬러링 또는 레이어링을 할 수 있으며, 컴퓨터가 사용자로부터 컬러링 또는 레이어링 명령을 받아 입력할 수 있다.Even when annotating through coloring or layering, one or more users may directly color or layer similarly to labeling, and a computer may input a coloring or layering command from a user.

또한, 컴퓨터가 학습에 의하여 이전에 컬러링 또는 레이어링 되었던 객체와 현재 컬러링 또는 레이어링 대상 객체가 동일하다고 판단되는 경우에는, 컴퓨터는 현재 컬러링 또는 레이어링 대상 객체에 대하여 이전 객체와 동일하게 컬러링 또는 레이어링 할 수 있다.In addition, when the computer determines that the object previously colored or layered by the learning and the current coloring or layering object are the same, the computer may color or layer the current coloring or layering object as the previous object. .

한편, 학습 대상 프레임 이미지가 하나 이상의 학습 대상 객체가 검출된 프레임 이미지인 경우 학습용 동영상 셋을 생성하는 단계(S700)는, 학습 대상 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체에 대하여 주석화 처리하는 단계(S720)를 포함한다.Meanwhile, when the learning target frame image is a frame image in which one or more learning object objects are detected (S700), annotating the detected one or more learning object objects on the learning object frame image (S700). (S720).

검출된 하나 이상의 학습 대상 객체에 대하여 주석화 처리하는 단계(S720)는, 상술한 내용과 동일하다.Annotating the detected one or more learning target objects (S720) is the same as described above.

본 발명의 다른 실시예에 따른 인공지능 영상 학습 방법은, 학습용 동영상 셋을 이용하여 인공지능 영상 학습을 수행하는 단계를 포함하고, 학습용 동영상은, 상술한 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 방법에 의해 생성된 학습용 동영상이다.The artificial intelligence image learning method according to another embodiment of the present invention includes performing artificial intelligence image learning using a training video set, and the training video includes a learning target frame of the video for the artificial intelligence image learning described above. This is a training video created by the image sampling method.

따라서, 학습용 동영상의 생성을 위한 방법은, 상술한 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 방법과 동일하게 적용되며, 본 발명의 다른 실시예에 따른 인공지능 영상 학습 방법은, 생성된 학습용 동영상 셋을 이용하여 인공지능 영상 학습을 수행한다.Therefore, the method for generating the training video is applied in the same manner as the method for sampling the target frame image of the video for learning the artificial intelligence image described above, and the artificial intelligence image learning method according to another embodiment of the present invention is generated. AI image learning is performed using the training video set.

본 발명의 또 다른 실시예에 따른 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 장치는, 원시 동영상 수신부, 프레임 이미지 추출부, 학습 대상 객체 검출부, 배경 제거부, 객체의 이동량 측정부 및 학습 대상 프레임 이미지 선정부를 포함한다.In accordance with another embodiment of the present invention, an apparatus for sampling a target frame image of a video for learning an artificial intelligence image may include: a raw video receiver, a frame image extractor, a target object detector, a background remover, an object movement measurer, and a learner; The target frame image selection unit is included.

원시 동영상 수신부는, 인공지능 영상 학습을 위해 원시 동영상을 수신한다.The raw video receiver receives the raw video for artificial intelligence video learning.

프레임 이미지 추출부는, 수신한 원시 동영상에서 미리 정해진 시간 구간 당 미리 정해진 개수의 프레임 이미지를 추출한다.The frame image extractor extracts a predetermined number of frame images per predetermined time interval from the received raw video.

학습 대상 객체 검출부는, 객체 검출 알고리즘을 이용하여 각각의 프레임 이미지 상에서 미리 정해진 하나 이상의 유형의 하나 이상의 학습 대상 객체를 검출한다.The learning object detection unit detects one or more types of learning objects of one or more types predetermined on each frame image using an object detection algorithm.

배경 제거부는, 각각의 상기 프레임 이미지 상에서 학습 대상 객체를 제외한 배경을 제거한다.The background remover removes a background excluding a learning target object on each of the frame images.

객체의 이동량 측정부는, 배경이 제거된 제n(n은 2이상의 자연수) 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체의 위치와 제n 프레임 이미지의 직전의 배경이 제거된 제n-1 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체의 위치를 비교하여, 제n 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체 각각의 이동량을 측정한다.The movement amount measuring unit of the object may include a position of the detected one or more objects to be learned on the nth frame image from which the background is removed (n is a natural number of two or more) and an n-1 frame image on which the background immediately before the nth frame image is removed. The amount of movement of each of the detected one or more learning objects on the nth frame image is measured by comparing the detected positions of the one or more learning objects.

학습 대상 프레임 이미지 선정부는, 제n 프레임 이미지 상의 검출된 하나 이상의 학습 대상 객체 각각의 이동량의 측정 결과와 미리 정해진 기준을 비교하여 제n 프레임 이미지를 학습 대상 프레임 이미지로 선정한다.The learning object frame image selecting unit selects the nth frame image as the learning object frame image by comparing a measurement result of a movement amount of each detected one or more learning object objects on the nth frame image with a predetermined criterion.

본 발명의 또 다른 실시예에 따른 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 장치는, 선정된 상기 학습 대상 프레임 이미지를 가공 처리하여 학습용 동영상 셋을 생성하는, 학습용 동영상 셋 생성부를 더 포함한다.The apparatus for sampling a learning frame image of a video for learning an artificial intelligence image according to another exemplary embodiment of the present invention further includes a learning video set generating unit configured to process the selected learning object frame image to generate a learning video set. do.

인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 장치의 각 구성은, 도 1 내지 도 6의 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 방법에서 설명한 내용과 동일하게 적용된다.Each configuration of the apparatus for sampling a target frame image of a video for AI image learning is applied in the same manner as described in the method for sampling the target frame image of a video for AI image learning of FIGS. 1 to 6.

본 발명의 또 다른 실시예에 따른 인공지능 영상 학습 장치는, 학습용 동영상 셋을 이용하여 인공지능 영상 학습을 수행하는, 인공지능 영상 학습 수행부를 포함하고, 학습용 동영상은, 상술한 인공지능 영상 학습을 위한 동영상의 학습 대상 프레임 이미지 샘플링 장치에 의해 생성된 학습용 동영상이다.An artificial intelligence image learning apparatus according to another embodiment of the present invention includes an artificial intelligence image learning performing unit for performing artificial intelligence image learning using a training video set, and the training video includes the above-described artificial intelligence image learning. The training video is generated by the apparatus for sampling a learning target frame image of a moving picture.

본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.The steps of a method or algorithm described in connection with an embodiment of the present invention may be implemented directly in hardware, in a software module executed by hardware, or by a combination thereof. Software modules may include random access memory (RAM), read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, hard disk, removable disk, CD-ROM, or It may reside in any form of computer readable recording medium well known in the art.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다. In the above, embodiments of the present invention have been described with reference to the accompanying drawings, but those skilled in the art to which the present invention pertains may realize the present invention in other specific forms without changing the technical spirit or essential features thereof. I can understand that. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive.

10 : 프레임 이미지
20 : 학습 대상 객체
30 : 배경10: frame image
20: object to learn
30: Background

Claims

As a method performed by a computer,
A raw video receiving step of receiving a raw video for learning an artificial intelligence image;
Extracting a predetermined number of frame images per predetermined time interval from the received raw video;
Learning one or more types of one or more types of learning objects on each of the frame images using an object detection algorithm;
A background removal step of removing a background excluding the learning object from each of the frame images;
The detected position on the n-th frame image from which the background of the n-th (n is a natural number of 2 or more) frame image is removed and the background immediately before the n-th frame image is removed. Measuring a movement amount of each of the detected one or more objects to be learned on the nth frame image by comparing positions of the objects to be learned; And
And selecting the n-th frame image as a learning target frame image by comparing a measurement result of a movement amount of each of the detected one or more learning target objects on the n-th frame image with a predetermined criterion. ,
When a plurality of learning target objects are included in one frame image
The movement amount of all of the plurality of learning objects is measured, and a measurement result of the movement amount of a predetermined number of learning objects of the plurality of learning objects is equal to or greater than a predetermined reference value, or the movement amount of all of the plurality of learning objects If the measurement result is more than a predetermined standard, select the learning frame image,
Only the movement amount of a predetermined number of learning object objects of the plurality of learning object objects is measured, and a measurement result of the movement amount of the learning object of a predetermined number of the measured object objects is greater than or equal to a predetermined criterion, or the measured When the measurement result of the movement amount of all the learning object is more than a predetermined reference, characterized in that for selecting as the learning target frame image,
Frame image sampling method for learning video for AI image learning.

delete

The method of claim 1,
The method may further include generating a learning video set by processing the selected learning object frame image to generate a learning video set.
Frame image sampling method for learning video for AI image learning.

The method of claim 3,
The learning target frame image in the learning video set generation step,
In the frame image extraction step is a frame image before the one or more learning object is detected,
The learning video set generation step,
Detecting one or more learning objects of the one or more types predetermined in the learning object frame image; And
And annotating the step of annotating the detected one or more learning object on the learning object frame image.
Frame image sampling method for learning video for AI image learning.

The method of claim 3,
The learning target frame image in the learning video set generation step,
The at least one learning object is a frame image detected in the learning object detection step,
The learning video set generation step,
And annotating the step of annotating the detected one or more learning object on the learning object frame image.
Frame image sampling method for learning video for AI image learning.

As a method performed by a computer,
Performing an AI image learning using the learning video set,
The learning video,
A raw video receiving step of receiving raw video for artificial intelligence video learning,
A frame image extraction step of extracting a predetermined number of frame images per predetermined time interval from the received raw video;
A learning object detection step of detecting one or more types of learning object of at least one type predetermined on each of said frame images using an object detection algorithm;
A background removal step of removing a background except for the object to be learned on each frame image;
The detected position on the n-th frame image from which the background of the n-th (n is a natural number of 2 or more) frame image is removed and the background immediately before the n-th frame image is removed. A moving amount measuring step of measuring an amount of movement of each of the detected one or more learning object objects on the nth frame image by comparing positions of the learning object objects;
Selecting the nth frame image as a learning target frame image by comparing a measurement result of a movement amount of each of the detected one or more learning target objects on the nth frame image with a predetermined criterion; and
And generating a learning video set by processing the selected learning object frame image to generate a learning video set.
When a plurality of learning target objects are included in one frame image
Measure a movement amount of each of the plurality of learning object objects, and a measurement result of the movement amount of a predetermined number of learning object objects from the plurality of learning object objects is equal to or greater than a predetermined criterion, or of a movement amount of all of the plurality of learning object objects; If the measurement result is more than a predetermined criterion, select the frame image to learn,
Only the movement amount of a predetermined number of learning object objects of the plurality of learning object objects is measured, and a measurement result of the movement amount of the learning object of a predetermined number of the measured object objects is greater than or equal to a predetermined criterion, or the measured When the measurement result of the movement amount of all the learning object is more than a predetermined criterion, characterized in that it is generated by the learning object frame image sampling method of the video for AI image learning, characterized in that for selecting the learning object frame image,
AI image learning method.

A raw video receiving unit for receiving a raw video for learning an artificial intelligence image;
A frame image extracting unit extracting a predetermined number of frame images per predetermined time interval from the received raw video;
A learning object detection unit for detecting one or more types of learning object of at least one type predetermined on each of the frame images using an object detection algorithm;
A background remover configured to remove a background excluding the learning target object from each frame image;
The detected position on the n-th frame image from which the background of the n-th (n is a natural number of 2 or more) frame image is removed and the background immediately before the n-th frame image is removed. A moving amount measuring unit of the object comparing the positions of the learning target objects and measuring a moving amount of each of the detected one or more learning target objects on the n-th frame image; And
And a learning object frame image selecting unit configured to select the nth frame image as a learning object frame image by comparing a measurement result of a movement amount of each of the detected one or more learning object objects on the nth frame image with a predetermined criterion.
When a plurality of learning target objects are included in one frame image
Measure a movement amount of each of the plurality of learning object objects, and a measurement result of the movement amount of a predetermined number of learning object objects from the plurality of learning object objects is equal to or greater than a predetermined criterion, or of a movement amount of all of the plurality of learning object objects; If the measurement result is more than a predetermined criterion, select the frame image to learn,
Only the movement amount of a predetermined number of learning object objects of the plurality of learning object objects is measured, and a measurement result of the movement amount of the learning object of a predetermined number of the measured object objects is greater than or equal to a predetermined criterion, or the measured When the measurement result of the movement amount of all the learning object is more than a predetermined reference, characterized in that for selecting the learning target frame image,
Frame image sampling device for learning video for AI image learning.

The method of claim 7, wherein
Further comprising a learning video set generation unit for processing the selected learning object frame image to generate a learning video set,
Frame image sampling device for learning video for AI image learning.

Including an artificial intelligence image learning unit for performing the artificial intelligence image learning using the learning video set,
The learning video,
Raw video receiving unit for receiving the raw video for artificial intelligence video learning,
A frame image extracting unit extracting a predetermined number of frame images per predetermined time interval from the received raw video;
A learning object detection unit for detecting one or more types of learning objects of at least one type predetermined on each of the frame images using an object detection algorithm;
A background removal unit for removing a background except for the object to be learned on each frame image;
The detected position on the n-th frame image from which the background of the n-th (n is a natural number of 2 or more) frame image is removed and the background immediately before the n-th frame image is removed. A movement amount measurement unit of the object, comparing the positions of the learning object objects to measure the movement amount of each of the detected one or more learning object objects on the n-th frame image;
A learning object frame image selecting unit which selects the nth frame image as a learning object frame image by comparing a measurement result of a movement amount of each of the detected one or more learning object objects on the nth frame image with a predetermined criterion;
And a learning video set generation unit configured to process the selected learning object frame image to generate a learning video set.
When a plurality of learning target objects are included in one frame image
Measure a movement amount of each of the plurality of learning object objects, and a measurement result of the movement amount of a predetermined number of learning object objects from the plurality of learning object objects is equal to or greater than a predetermined criterion, or a movement amount of all of the plurality of learning object objects If the measurement result is more than a predetermined criterion, select the frame image to learn,
Only the movement amount of a predetermined number of learning object objects of the plurality of learning object objects is measured, and a measurement result of the movement amount of the learning object of a predetermined number of the measured object objects is greater than or equal to a predetermined criterion, or the measured Characterized in that generated by the learning object frame image sampling apparatus of the video for artificial intelligence image learning, characterized in that for selecting the learning object frame image, if the measurement result of the movement amount of all the learning object is more than a predetermined reference,
AI image learning device.

A frame image sampling program for learning a video for AI image learning, which is stored in a recording medium for executing the method of any one of claims 1 and 3, using a computer which is hardware.