KR20230103887A

KR20230103887A - System and method for searching contents in accordance with advertisements

Info

Publication number: KR20230103887A
Application number: KR1020220072846A
Authority: KR
Inventors: 김세은; 박동찬; 오재호
Original assignee: 주식회사 파일러
Priority date: 2021-12-30
Filing date: 2022-06-15
Publication date: 2023-07-07
Also published as: KR102411095B1

Abstract

동영상 분석 기술을 통해 정확한 맥락을 파악하여 최적의 광고를 매칭하고, 더 나아가 불건전한 요소와 비난으로부터 브랜드 안전성을 보장하는 광고 적합 콘텐츠 탐색 시스템 및 방법이 개시된다. 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템은 비디오 콘텐츠를 비디오 캡셔닝을 기반으로 분석하여 시계열 구간별로 비디오 캡션을 생성하여 상기 비디오 콘텐츠의 맥락을 추출하는 맥락 추출부; 및 상기 비디오 콘텐츠에서 시계열 구간별로 추출된 맥락을 기반으로 시계열 구간별로 광고 콘텐츠를 매칭하는 광고 매칭부;를 포함한다.An ad-suitable content search system and method for matching an optimal advertisement by identifying an accurate context through video analysis technology and further ensuring brand safety from unwholesome elements and criticism are disclosed. A content search system suitable for advertisements according to an embodiment of the present invention includes a context extraction unit that extracts the context of the video content by analyzing video content based on video captioning and generating video captions for each time series section; and an advertisement matching unit that matches advertisement content for each time-series section based on the context extracted for each time-series section from the video content.

Description

Ad suitable content search system and method {SYSTEM AND METHOD FOR SEARCHING CONTENTS IN ACCORDANCE WITH ADVERTISEMENTS}

본 발명은 디지털 마케팅을 위해 콘텐츠를 분석하여 맥락 맞춤형 광고 서비스를 제공하고 관리하는 광고 적합 콘텐츠 탐색 시스템 및 방법에 관한 것이다.The present invention relates to a system and method for searching for advertisement suitable contents for digital marketing by analyzing contents to provide and manage a context-customized advertisement service.

디지털 환경에서의 마케팅 성과 증대를 위해 다양한 방법이 시도되어 왔다. 그러나, 종래의 디지털 마케팅 방식은 대부분 광고 집행과 성과 분석을 위해 막대한 양의 데이터를 필요로 한다. 이를 위해 수집되는 정보의 종류로는 노출 수, 클릭 수, 조회 수 등의 광고 성과 측정에 사용되는 지표들과, 광고 게재를 위해 검색 내용, 관심사 등과 같이 유저의 행동 정보를 직접적으로 추적하여 획득되는 데이터 등이 있다.Various methods have been tried to increase marketing performance in the digital environment. However, most of the conventional digital marketing methods require a huge amount of data for advertisement execution and performance analysis. The types of information collected for this include indicators used to measure advertising performance, such as the number of impressions, clicks, and views, and information obtained by directly tracking user behavior information, such as search contents and interests, for advertisement delivery. data, etc.

그러나 최근에 개인정보보호 문제가 대두되면서 이러한 유저의 개인정보에 대한 추적을 제한하는 정책들이 국내외 유수의 기업들에서 시행되고 있어, 유저의 행동 정보 추적 데이터를 기반으로 한 광고의 집행이 점차 어려워지고 있다. 따라서 개인정보의 활용 없이도 소비자들에게 정확한 광고를 제공할 수 있도록 하기 위하여, 콘텐츠의 내용/맥락과 관련 있는 광고를 제공하는 맥락 마케팅이 요구되고 있다.However, as privacy issues have recently emerged, policies restricting the tracking of such users' personal information are being implemented by leading domestic and foreign companies, making it increasingly difficult to execute advertisements based on users' behavioral information tracking data. there is. Therefore, in order to provide accurate advertisements to consumers without using personal information, context marketing that provides advertisements related to the contents/context of contents is required.

특히 디지털 시장에서 동영상 플랫폼의 점유율이 높아지고 있는데, 이러한 동영상 광고에 맥락 마케팅의 적용이 필요하다. 일반적인 맥락 마케팅 방법은 동영상 주변부에 위치한 텍스트 정보(또는 콘텐츠에 포함된 자막 정보)를 활용하는 수준에 그치고 있다. 이 경우, 동영상 내의 정확한 맥락을 파악할 수 없으며, 잘못된 맥락 정보가 도출되어 콘텐츠에 적합한 광고를 제공하지 못하게 되는 문제가 발생할 수 있다.In particular, the share of video platforms is increasing in the digital market, and it is necessary to apply contextual marketing to these video advertisements. A general contextual marketing method is limited to utilizing text information (or subtitle information included in content) located in the periphery of a video. In this case, it is not possible to determine the exact context within the video, and a problem may arise in that an advertisement suitable for the content cannot be provided because incorrect context information is derived.

본 발명은 동영상 분석 기술을 통해 동영상 콘텐츠에서 정확한 맥락을 파악하여 최적의 광고를 매칭할 수 있는 광고 적합 콘텐츠 탐색 시스템 및 방법을 제공하기 위한 것이다.SUMMARY OF THE INVENTION The present invention is to provide a system and method for searching for advertisement suitable contents that can match an optimal advertisement by identifying an accurate context in video contents through video analysis technology.

또한, 본 발명은 멀티-모달 비디오 캡셔닝(Multi-Modal Video Captioning) 기반의 비디오 내 광범위한 맥락 분석을 통해 비디오 내 비젼 및 오디오 정보를 바탕으로 광고에 적합한 콘텐츠를 탐색하고, 광고 표출 시점을 결정하는 광고 적합 콘텐츠 탐색 시스템 및 방법을 제공하기 위한 것이다.In addition, the present invention searches for content suitable for advertisement based on vision and audio information in video through extensive context analysis in video based on Multi-Modal Video Captioning and determines advertisement display time. It is to provide an advertisement suitable content search system and method.

본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템은 비디오 콘텐츠를 비디오 캡셔닝을 기반으로 분석하여 시계열 구간별로 비디오 캡션을 생성하여 상기 비디오 콘텐츠의 맥락을 추출하는 맥락 추출부; 및 상기 비디오 콘텐츠에서 시계열 구간별로 추출된 맥락을 기반으로 시계열 구간별로 광고 콘텐츠를 매칭하는 광고 매칭부;를 포함한다.A content search system suitable for advertisements according to an embodiment of the present invention includes a context extraction unit that extracts the context of the video content by analyzing video content based on video captioning and generating video captions for each time series section; and an advertisement matching unit that matches advertisement content for each time-series section based on the context extracted for each time-series section from the video content.

상기 맥락 추출부는 상기 비디오 콘텐츠에서 객체의 행동 정보, 소리 정보 및 자막 정보를 추출하는 정보 추출부; 및 상기 행동 정보, 상기 소리 정보 및 상기 자막 정보를 기반으로 멀티-모달 분석을 통해 상기 비디오 캡션을 생성하는 비디오 캡션부;를 포함할 수 있다.The context extraction unit may include an information extraction unit extracting object behavior information, sound information, and subtitle information from the video content; and a video caption unit generating the video caption through multi-modal analysis based on the behavior information, the sound information, and the caption information.

상기 비디오 캡션부는 상기 비젼 데이터와 상기 오디오 데이터를 기초로 멀티-모달 분석을 통해 비젼 인코더 벡터와, 오디오 인코더 벡터를 생성하는 인코더부; 및 학습된 자막 키 값들을 기초로 상기 비디오 데이터와 관련된 자막 데이터를 셀프 어텐션 처리하여 자막 어텐션 벡터를 생성하고, 상기 자막 어텐션 벡터와 상기 비젼 인코더 벡터 및 상기 오디오 인코더 벡터를 멀티-모달 어텐션 처리하여 상기 비디오 캡션을 생성하는 디코더부;를 포함할 수 있다.The video caption unit may include an encoder unit generating a vision encoder vector and an audio encoder vector through multi-modal analysis based on the vision data and the audio data; and generating a caption attention vector by performing self-attention processing on caption data related to the video data based on learned caption key values, performing multi-modal attention processing on the caption attention vector, the vision encoder vector, and the audio encoder vector, It may include; a decoder unit that generates video captions.

상기 인코더부는 학습된 비젼 키 값들을 기초로 상기 비젼 데이터를 셀프 어텐션 처리하여 비젼 어텐션 벡터를 생성하는 비젼 셀프 어텐션부; 학습된 오디오 키 값들을 기초로 상기 오디오 데이터를 셀프 어텐션 처리하여 오디오 어텐션 벡터를 생성하는 오디오 셀프 어텐션부; 상기 비젼 어텐션 벡터 및 상기 오디오 어텐션 벡터를 기초로 멀티-모달 분석을 수행하여 제1 특징 벡터를 생성하는 제1 멀티-모달 어텐션부; 상기 비젼 어텐션 벡터 및 상기 오디오 어텐션 벡터를 기초로 멀티-모달 분석을 수행하여 제2 특징 벡터를 생성하는 제2 멀티-모달 어텐션부; 상기 제1 멀티-모달 어텐션부에 의해 생성되는 상기 제1 특징 벡터로부터 비젼 인코더 벡터를 생성하는 제1 완전 연결층; 및 상기 제2 멀티-모달 어텐션부에 의해 생성되는 상기 제2 특징 벡터로부터 상기 오디오 인코더 벡터를 생성하는 제2 완전 연결층;을 포함할 수 있다.The encoder unit includes a vision self-attention unit generating a vision attention vector by performing self-attention processing on the vision data based on learned vision key values; an audio self-attention unit generating an audio attention vector by performing self-attention processing on the audio data based on learned audio key values; a first multi-modal attention unit generating a first feature vector by performing multi-modal analysis based on the vision attention vector and the audio attention vector; a second multi-modal attention unit generating a second feature vector by performing multi-modal analysis based on the vision attention vector and the audio attention vector; a first fully connected layer generating a vision encoder vector from the first feature vector generated by the first multi-modal attention unit; and a second fully connected layer generating the audio encoder vector from the second feature vector generated by the second multi-modal attention unit.

상기 광고 매칭부는 상기 비디오 캡션을 기반으로 추출된 정보를 기반으로 광고와 상기 비디오 콘텐츠의 적합도를 추산하여 상기 비디오 콘텐츠와 관련된 광고를 매칭하고; 상기 비디오 콘텐츠의 재생 구간 중에 상기 광고가 표출되는 광고 표출 시점을 결정할 수 있다.the advertisement matching unit estimates a degree of suitability between the advertisement and the video content based on information extracted based on the video caption, and matches the advertisement related to the video content; It is possible to determine an advertisement presentation time when the advertisement is displayed during a reproduction period of the video content.

본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템은 비디오 콘텐츠에 표출되는 광고 로그를 확인하고, 비디오 콘텐츠 별로 광고 표출에 따른 광고 성과를 관리하는 관리부;를 더 포함할 수 있다.The advertisement suitable content search system according to an embodiment of the present invention may further include a management unit that checks an advertisement log displayed in video content and manages advertisement performance according to advertisement display for each video content.

본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법은 맥락 추출부에 의해, 비디오 콘텐츠를 비디오 캡셔닝을 기반으로 분석하여 시계열 구간별로 비디오 캡션을 생성하여 상기 비디오 콘텐츠의 맥락을 추출하는 단계; 및 광고 매칭부에 의해, 상기 비디오 콘텐츠에서 시계열 구간별로 추출된 맥락을 기반으로 시계열 구간별로 광고 콘텐츠를 매칭하는 단계;를 포함한다.A method for searching for advertisement suitable content according to an embodiment of the present invention includes extracting the context of the video content by analyzing video content based on video captioning and generating a video caption for each time series section by a context extraction unit; and matching advertisement content for each time-series section based on the context extracted for each time-series section from the video content by an advertisement matching unit.

본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법은 정보 추출부에 의해, 상기 비디오 콘텐츠에서 객체의 행동 정보, 소리 정보 및 자막 정보를 추출하는 단계;를 더 포함할 수 있다.The advertisement suitable content search method according to an embodiment of the present invention may further include extracting behavior information, sound information, and subtitle information of an object from the video content by an information extraction unit.

상기 맥락을 추출하는 단계는 상기 행동 정보, 상기 소리 정보 및 상기 자막 정보를 기반으로 멀티-모달 분석을 통해 상기 비디오 캡션을 생성하는 단계;를 포함할 수 있다.The extracting of the context may include generating the video caption through multi-modal analysis based on the behavior information, the sound information, and the caption information.

상기 비디오 캡션을 생성하는 단계는 상기 비디오 콘텐츠를 비젼 데이터와 오디오 데이터로 분할하는 단계; 상기 비젼 데이터를 기초로 행동 정지점을 설정하여 상기 시계열 구간을 분할하는 단계; 및 인공지능 모델에 의해 상기 시계열 구간별로 상기 비젼 데이터 및 상기 오디오 데이터를 기초로 비젼 모드와 오디오 모드의 멀티-모달 분석을 통해 상기 객체의 행동과 관련된 상기 비디오 캡션을 생성하는 단계;를 포함할 수 있다.The generating of the video caption may include dividing the video content into vision data and audio data; dividing the time-series section by setting an action stop point based on the vision data; and generating the video caption related to the behavior of the object through multi-modal analysis of a vision mode and an audio mode based on the vision data and the audio data for each time series section by an artificial intelligence model. there is.

상기 비디오 캡션을 생성하는 단계는 인코더부에 의해, 상기 비젼 데이터와 상기 오디오 데이터를 기초로 멀티-모달 분석을 통해 비젼 인코더 벡터와, 오디오 인코더 벡터를 생성하는 단계; 디코더부에 의해, 학습된 자막 키 값들을 기초로 상기 비디오 데이터와 관련된 자막 데이터를 셀프 어텐션 처리하여 자막 어텐션 벡터를 생성하는 단계; 및 상기 디코더부에 의해, 상기 자막 어텐션 벡터와 상기 비젼 인코더 벡터 및 상기 오디오 인코더 벡터를 멀티-모달 어텐션 처리하여 상기 비디오 캡션을 생성하는 단계;를 포함할 수 있다.The generating of the video caption may include generating, by an encoder unit, a vision encoder vector and an audio encoder vector through multi-modal analysis based on the vision data and the audio data; generating a caption attention vector by performing self-attention processing on caption data related to the video data based on learned caption key values, by a decoder unit; and generating the video caption by performing multi-modal attention processing on the caption attention vector, the vision encoder vector, and the audio encoder vector by the decoder unit.

상기 비젼 인코더 벡터와, 오디오 인코더 벡터를 생성하는 단계는 비젼 셀프 어텐션부에 의해, 학습된 비젼 키 값들을 기초로 상기 비젼 데이터를 셀프 어텐션 처리하여 비젼 어텐션 벡터를 생성하는 단계; 오디오 셀프 어텐션부에 의해, 학습된 오디오 키 값들을 기초로 상기 오디오 데이터를 셀프 어텐션 처리하여 오디오 어텐션 벡터를 생성하는 단계; 상기 비젼 어텐션 벡터 및 상기 오디오 어텐션 벡터를 제1 멀티-모달 어텐션부에 입력하여 상기 비젼 인코더 벡터를 생성하는 단계; 및 상기 비젼 어텐션 벡터 및 상기 오디오 어텐션 벡터를 제2 멀티-모달 어텐션부에 입력하여 상기 오디오 인코더 벡터를 생성하는 단계;를 포함할 수 있다.The generating of the vision encoder vector and the audio encoder vector may include generating a vision attention vector by performing self-attention processing on the vision data based on learned vision key values by a vision self-attention unit; generating an audio attention vector by performing self-attention processing on the audio data based on learned audio key values, by an audio self-attention unit; generating the vision encoder vector by inputting the vision attention vector and the audio attention vector to a first multi-modal attention unit; and generating the audio encoder vector by inputting the vision attention vector and the audio attention vector to a second multi-modal attention unit.

상기 광고 콘텐츠를 매칭하는 단계는 상기 비디오 캡션을 기반으로 추출된 정보를 기반으로 광고와 상기 비디오 콘텐츠의 적합도를 추산하여 상기 비디오 콘텐츠와 관련된 광고를 매칭하는 단계; 및 상기 비디오 콘텐츠의 재생 구간 중에 상기 광고가 표출되는 광고 표출 시점을 결정하는 단계;를 포함할 수 있다.The matching of the advertisement content may include estimating a degree of suitability between the advertisement and the video content based on information extracted based on the video caption, and matching the advertisement related to the video content; and determining an advertisement presentation time when the advertisement is displayed during a reproduction period of the video content.

본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법은 관리부에 의해, 비디오 콘텐츠에 표출되는 광고 로그를 확인하고, 비디오 콘텐츠 별로 광고 표출에 따른 광고 성과를 관리하는 단계;를 더 포함할 수 있다.The advertisement suitable content search method according to an embodiment of the present invention may further include, by a management unit, checking an advertisement log displayed in video content and managing advertisement performance according to advertisement display for each video content.

본 발명의 실시예에 따르면, 상기 광고 적합 콘텐츠 탐색 방법을 실행시키도록 컴퓨터로 판독 가능한 기록 매체에 기록된 컴퓨터 프로그램이 제공된다.According to an embodiment of the present invention, a computer program recorded on a computer-readable recording medium is provided to execute the advertisement suitable content search method.

본 발명의 실시예에 의하면, 동영상 분석 기술을 통해 동영상 콘텐츠에서 정확한 맥락을 파악하여 최적의 광고를 매칭할 수 있는 광고 적합 콘텐츠 탐색 시스템 및 방법이 제공된다.According to an embodiment of the present invention, a system and method for searching for advertisement suitable contents capable of matching an optimal advertisement by identifying an accurate context in video contents through video analysis technology are provided.

또한, 본 발명의 실시예에 의하면, 멀티-모달 비디오 캡셔닝(Multi-Modal Video Captioning) 기반의 비디오 내 광범위한 맥락 분석을 통해 비디오 내 비젼 및 오디오 정보를 바탕으로 광고에 적합한 콘텐츠를 탐색하고, 광고 표출 시점을 결정하는 광고 적합 콘텐츠 탐색 시스템 및 방법이 제공된다.In addition, according to an embodiment of the present invention, content suitable for advertisement is searched based on vision and audio information in video through extensive context analysis in video based on multi-modal video captioning, and advertisement An ad-suitable content search system and method for determining an expression time point are provided.

도 1은 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템의 구성도이다.
도 2는 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템의 서비스를 나타낸 개념도이다.
도 3은 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템을 보다 구체적으로 나타낸 구성도이다.
도 4는 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템을 구성하는 비디오 캡션부의 구성도이다.
도 5는 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법의 순서도이다.
도 6은 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법을 보다 구체적으로 나타낸 순서도이다.
도 7은 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법에 따라 비디오 콘텐츠의 맥락을 추출한 결과를 나타낸 예시도이다.
도 8은 본 발명의 실시예에 따라 메타 데이터를 분석한 결과를 나타낸 예시도이다.
도 9는 본 발명의 실시예에 따라 광고 성과와 트렌드를 분석한 결과를 나타낸 예시도이다.
도 10은 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법에 의해 수행된 이미지 캡션 분석 결과를 나타낸 예시도이다.
도 11은 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법에 의해 수행된 비디오 캡션 분석 결과를 나타낸 예시도이다.
도 12는 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법에 의해 수행된 텍스트 처리 결과를 나타낸 예시도이다.1 is a block diagram of an advertisement suitable content search system according to an embodiment of the present invention.
2 is a conceptual diagram illustrating a service of a content search system suitable for advertisements according to an embodiment of the present invention.
3 is a configuration diagram showing a system for searching for advertisement suitable content according to an embodiment of the present invention in more detail.
4 is a block diagram of a video caption unit constituting an advertisement suitable content search system according to an embodiment of the present invention.
5 is a flowchart of a method for searching for advertisement suitable content according to an embodiment of the present invention.
6 is a flowchart illustrating in detail a method for searching for content suitable for advertisements according to an embodiment of the present invention.
7 is an exemplary diagram illustrating a result of extracting a context of video content according to a method for searching for content suitable for an advertisement according to an embodiment of the present invention.
8 is an exemplary view showing the result of analyzing meta data according to an embodiment of the present invention.
9 is an exemplary diagram illustrating results of analyzing advertisement performance and trends according to an embodiment of the present invention.
10 is an exemplary diagram illustrating an image caption analysis result performed by a method for searching for advertisement suitable content according to an embodiment of the present invention.
11 is an exemplary diagram illustrating a video caption analysis result performed by a method for searching for advertisement suitable content according to an embodiment of the present invention.
12 is an exemplary diagram illustrating text processing results performed by a method for searching for advertisement suitable content according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Advantages and features of the present invention, and methods of achieving them, will become clear with reference to the detailed description of the following embodiments taken in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only these embodiments make the disclosure of the present invention complete, and common knowledge in the art to which the present invention belongs. It is provided to completely inform the person who has the scope of the invention, and the present invention is only defined by the scope of the claims. Like reference numbers designate like elements throughout the specification.

본 명세서에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 본 명세서에서 사용되는 '~모듈', '~부'는 적어도 하나의 기능이나 동작을 처리하는 단위로서, 예를 들어 소프트웨어, FPGA 또는 하나 이상의 프로세서와 같은 하드웨어 구성요소를 의미할 수 있다. 본 발명의 실시 예를 설명함에 있어서, 관련된 공지의 기능 또는 공지의 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다.In this specification, when a certain component is said to "include", it means that it may further include other components, not excluding other components unless otherwise stated. As used herein, '~ module' and '~ unit' are units that process at least one function or operation, and may mean hardware components such as software, FPGA, or one or more processors. In describing the embodiments of the present invention, if it is determined that a detailed description of a related known function or known configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted.

본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템 및 방법은 맥락 맞춤형 광고와 콘텐츠 분석 정보를 제공하고 관리하기 위한 것으로, 동영상 분석 기술을 통해 동영상 콘텐츠의 정확한 맥락을 파악하여 비디오 맥락 및 내용 이해를 기반으로 광고 적합 콘텐츠를 탐색함으로써, 콘텐츠의 내용/맥락과 관련 있는 최적의 광고를 매칭하여 제공하며, 개인 정보의 활용 없이도 소비자들에게 정확한 광고를 제공할 수 있도록 하고, 더 나아가 불건전한 요소와 비난으로부터 브랜드 안전성을 보장한다.Ad-suitable content search system and method according to an embodiment of the present invention are for providing and managing context-customized advertisements and content analysis information, and based on understanding the video context and content by identifying the exact context of video content through video analysis technology. By searching for advertisement-suitable content, matching and providing the optimal advertisement related to the contents/context of the content, providing accurate advertisement to consumers without using personal information, and furthermore, avoiding unwholesome elements and criticism. Ensure brand safety.

본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템 및 방법은 멀티-모달 비디오 캡셔닝(Multi-Modal Video Captioning) 기반의 비디오 내 광범위한 맥락 분석을 통해 비디오 내 비젼 및 오디오 정보를 바탕으로 광고가 들어가기에 가장 적합한 콘텐츠와 광고 표출 시점을 결정한다. 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템 및 방법은 광고업체 등에게 SaaS 플랫폼의 형태로 서비스를 제공할 수 있으며, 비디오 플랫폼 내에서 동영상 및 디스플레이 형태로 광고를 진행할 수 있다.The advertisement suitable content search system and method according to an embodiment of the present invention is a multi-modal video captioning (Multi-Modal Video Captioning) based on a wide range of context analysis in the video to enter the advertisement based on the vision and audio information in the video. Determine the most suitable content and advertisement presentation time. The system and method for searching for content suitable for advertisement according to an embodiment of the present invention can provide services in the form of a SaaS platform to advertisers and the like, and can run advertisements in the form of videos and displays within the video platform.

도 1은 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템의 개념도이다. 도 1을 참조하면, 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템(100)은 맥락 분석부(110)와, 자동화 파이프라인(120), 및 SaaS 플랫폼(130)을 포함할 수 있다.1 is a conceptual diagram of an advertisement suitable content search system according to an embodiment of the present invention. Referring to FIG. 1 , the advertisement suitable content search system 100 according to an embodiment of the present invention may include a context analyzer 110 , an automation pipeline 120 , and a SaaS platform 130 .

맥락 분석부(110)는 객체 인식을 통한 맥락 분석을 수행하는 비디오 캡션부(112)와, 영상 콘텐츠 내 텍스트 중심의 맥락 파악을 수행하는 텍스트 처리부(114)를 포함할 수 있다.The context analysis unit 110 may include a video caption unit 112 that performs context analysis through object recognition and a text processing unit 114 that performs context analysis centered on text in video content.

광고 적합 콘텐츠 탐색 시스템(100)은 맥락 분석부(110)에 의해 분석된 맥락 분석 데이터를 이용해 MLOps(122)를 기반으로 자동화 파이프라인(120)을 구축하여 SaaS 플랫폼(130)을 이용한 서비스를 제공할 수 있다. Saas 플랫폼(130)은 클라우드 시스템 내 Action REST API(132)와, 사용자 커스텀 가능한 분석 솔루션을 구비하는 사용자 인터페이스부(134)를 포함하여 구현될 수 있다.The ad-suitable content search system 100 builds an automation pipeline 120 based on MLOps 122 using the context analysis data analyzed by the context analyzer 110 to provide services using the SaaS platform 130. can do. The Saas platform 130 may be implemented by including an Action REST API 132 in a cloud system and a user interface unit 134 having a user-customizable analysis solution.

도 2는 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템의 구성도이다. 도 2를 참조하면, 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템은 분석부(310), 영상 분석 AI 서버(320), 판단부(330), 관리부(340), 및 SaaS 플랫폼(350)을 포함할 수 있다.2 is a block diagram of an ad suitable content search system according to an embodiment of the present invention. Referring to FIG. 2 , the advertisement suitable content search system according to an embodiment of the present invention includes an analysis unit 310, an image analysis AI server 320, a determination unit 330, a management unit 340, and a SaaS platform 350. can include

분석부(310)는 정보 추출부에 의해 추출된 정보를 기반으로 비디오 내의 여러 정보들을 통합하여 분석한 결과를 바탕으로 광고와 연관성이 있는 콘텐츠의 내용을 이해한 정보를 추출하는 비디오 캡션부 및 맥락 추출부를 포함할 수 있다. 분석부(310)는 영상 분석 AI 서버(320)와 정보를 주고받으며, 광고에 적합한 콘텐츠들을 영상 분석하고, 자연어 분석 기술을 통해 광고에 적합한 콘텐츠를 탐색하여 정리할 수 있다.The analysis unit 310 is a video caption unit and context for extracting information understanding the contents of the content related to the advertisement based on the result of integrating and analyzing various information in the video based on the information extracted by the information extraction unit. An extraction unit may be included. The analysis unit 310 exchanges information with the video analysis AI server 320, analyzes video for content suitable for advertisement, and searches for and organizes content suitable for advertisement through natural language analysis technology.

판단부(330)는 추출된 정보를 기반으로 광고와의 적합도를 추산하는 광고 매칭부를 포함할 수 있다. 판단부(330)는 광고 영상과 콘텐츠들의 연관성을 확인하여, 가장 적합한 위치(동영상 콘텐츠 재생 시점)에만 광고를 게재할 수 있도록 하여 최적의 예산으로 광고 효과를 극대화할 수 있도록 관리할 수 있다.The determination unit 330 may include an advertisement matching unit for estimating suitability with an advertisement based on the extracted information. The determination unit 330 checks the correlation between the advertisement video and the contents, and manages the advertisement to be displayed only at the most appropriate location (video content playback time) to maximize the advertisement effect with an optimal budget.

관리부(340)는 비디오 콘텐츠에 표출되는 광고 로그를 확인하고, 비디오 콘텐츠 별로 광고 표출에 따른 광고 성과를 관리할 수 있다. 또한, 관리부(340)는 각 콘텐츠 별로 광고 성과와 브랜드 안전성을 지속적으로 확인하고 자동으로 관리할 수 있다. 관리부(340)는 SaaS 플랫폼(350)과 연계시켜 편리하게 콘텐츠에 대한 광고 성과를 확인할 수 있도록 구현될 수 있다.The management unit 340 may check advertisement logs displayed in video content and manage advertisement performance according to advertisement display for each video content. In addition, the management unit 340 may continuously check and automatically manage advertisement performance and brand safety for each content. The management unit 340 may be implemented in association with the SaaS platform 350 to conveniently check advertisement performance for content.

상술한 바와 같은 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템은 시행하고자 하는 광고 정보를 파악한 후, 비디오 캡셔닝(Video Captioning)을 기반으로 동영상 등의 콘텐츠에서 맥락을 추출하여 광고에 가장 적합한 콘텐츠를 매칭해주고, 그 결과를 분석한다.As described above, the advertisement suitable content search system according to an embodiment of the present invention identifies the advertisement information to be implemented, and then extracts the context from the contents such as video based on video captioning to provide the most appropriate contents for the advertisement. match, and analyze the result.

또한, 어떤 콘텐츠에 광고가 올라갔는지의 로그(log)를 확인 가능하며, 콘텐츠에서 음성이나 텍스트로 추출해낼 수 있는 정보(예를 들어, 제목, 자막, 내용, 댓글 등)를 통해 트렌드를 분석하여 더욱 정확한 맥락 매칭을 통한 퍼포먼스 마케팅이 가능하도록 한다.In addition, it is possible to check the log of which content the advertisement was uploaded to, and analyze trends through information that can be extracted as voice or text from the content (eg, title, subtitle, content, comment, etc.) Enables performance marketing through more accurate context matching.

영상 분석 기술의 경우 영상 내 단일 정보만 이용하는 것이 아닌, 시/청각적인 요소를 종합적으로 분석하는 다중 정보 동시 분석 기능을 포함하고 있다. 따라서 영상에서 인식되는 객체의 행동 정보, 소리 정보, 자막 정보 등을 종합하여 콘텐츠를 분석함으로써 단순 인식만 하는 타 기술과 차별성을 가지며, 영상 맥락 이해에 대한 정확도를 크게 향상시킬 수 있다.In the case of video analysis technology, it includes a multi-information simultaneous analysis function that comprehensively analyzes visual and auditory elements, rather than using only single information in the video. Therefore, by analyzing the contents by synthesizing the behavioral information, sound information, subtitle information, etc. of the object recognized in the video, it has a differentiation from other technologies that only perform simple recognition, and can greatly improve the accuracy of understanding the video context.

자연어 분석 기술의 경우 영상 내 대화, 소리 등 음성 정보를 매우 높은 정확도로 텍스트 형태로 추출하는 STT 기술과, 추출된 텍스트를 문장 단위로 자동 구분하는 기술, 그리고 전체 대화/스크립트 등에서 핵심 내용을 요약할 수 있는 기술을 포함할 수 있다. 영상 분석 기술과의 접목을 통해 발화, 설명 등의 데이터에서 얻을 수 있는 정보의 양을 극대화할 수 있다.In the case of natural language analysis technology, STT technology extracts voice information such as conversations and sounds in video in text form with very high accuracy, technology that automatically classifies extracted text into sentence units, and summarizes key contents from entire dialogues/scripts. technology may be included. Through grafting with video analysis technology, it is possible to maximize the amount of information that can be obtained from data such as utterances and explanations.

본 발명의 실시예에 의하면, 광고에 최적으로 매칭되는 동영상 콘텐츠 분석 정보를 제공, 관리함으로써 광고 성과를 향상시킬 수 있고 브랜드 안전성을 정확하게 보호할 수 있다. 또한, 가장 성과가 잘 나올 수 있는 광고 위치를 AI 기술을 통해 정확하게 지정하기 때문에 광고 예산을 가장 효율적으로 사용할 수 있도록 도울 수 있다.According to an embodiment of the present invention, advertisement performance can be improved and brand safety can be accurately protected by providing and managing video content analysis information that is optimally matched to advertisements. In addition, it can help you use your advertising budget most efficiently because it accurately designates the advertising position that can produce the best results through AI technology.

도 3은 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템을 보다 구체적으로 나타낸 구성도이다. 도 3을 참조하면, 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템은 비디오 콘텐츠에서 객체의 행동 정보, 소리 정보 및 자막 정보를 추출하는 정보 추출부(116), 비디오 콘텐츠를 비디오 캡셔닝을 기반으로 분석하여 시계열 구간별로 비디오 캡션(비디오 맥락)을 생성하는 비디오 캡션부(112), 비디오 캡션부(112)에 의해 생성되는 비디오 캡션을 기초로 비디오 콘텐츠의 맥락을 추출하는 맥락 추출부(117), 및 비디오 콘텐츠에서 시계열 구간별로 추출된 맥락을 기반으로 시계열 구간별로 광고 콘텐츠를 매칭하는 광고 매칭부(118)를 포함할 수 있다. 비디오 캡션부(112)는 정보 추출부(116)에 의해 추출된 행동 정보, 소리 정보 및 자막 정보를 기반으로 멀티-모달 분석을 통해 비디오 캡션을 생성할 수 있다.3 is a configuration diagram showing a system for searching for advertisement suitable content according to an embodiment of the present invention in more detail. Referring to FIG. 3 , the advertisement suitable content search system according to an embodiment of the present invention includes an information extraction unit 116 that extracts object behavior information, sound information, and subtitle information from video content, and video content based on video captioning. A video captioning unit 112 that analyzes and generates video captions (video context) for each time series section, and a context extraction unit 117 that extracts the context of video content based on the video captions generated by the video captioning unit 112. , and an advertisement matching unit 118 that matches advertisement content for each time series section based on the context extracted for each time series section from the video content. The video caption unit 112 may generate video captions through multi-modal analysis based on the behavior information, sound information, and caption information extracted by the information extractor 116 .

광고 매칭부(118)는 비디오 캡션부(112)에 의해 생성되는 비디오 캡션을 기반으로 추출된 맥락 정보를 기반으로, 광고와 비디오 콘텐츠의 적합도를 추산하여 비디오 콘텐츠와 관련된 광고를 매칭하고, 비디오 콘텐츠의 재생 구간 중에 광고가 표출되는 광고 표출 시점(하나 또는 복수의 시계열 구간)을 결정할 수 있다.The advertisement matcher 118 matches an advertisement related to the video content by estimating the suitability of the advertisement and the video content based on the context information extracted based on the video caption generated by the video caption unit 112, and matching the video content with the advertisement related to the video content. It is possible to determine an advertisement presentation time (one or a plurality of time-series sections) at which advertisements are displayed during a reproduction section of .

이하에서 특징 구간과 영상의 맥락을 추출하는 영상 분석 기술, 그리고 요약과 감정 분석 등을 통해 적합도와 안전성을 뒷받침할 수 있는 텍스트 분석 기술을 통해 맥락 맞춤형 디지털 마케팅을 수행하는 방법에 대해 보다 구체적으로 설명한다.Hereinafter, a method of performing context-customized digital marketing will be explained in more detail through image analysis technology that extracts the context of feature sections and images, and text analysis technology that can support suitability and safety through summary and emotion analysis. do.

본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템 및 방법은 단순 객체 인식이 아니라, 입력되는 영상 콘텐츠로부터 특징 구간을 추출해내는 과정과, 이를 이용하여 캡션을 생성해내는 과정을 수행할 수 있는 비디오 캡셔닝(Video-captioning) 기술을 이용한 영상 분석 시스템으로 구현될 수 있다.The advertisement suitable content search system and method according to an embodiment of the present invention does not perform simple object recognition, but a process of extracting a feature section from input video content and a video capture process capable of generating a caption using the process. It can be implemented as an image analysis system using video-captioning technology.

본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템은 제목, 오디오, 설명, 댓글을 기반으로 맥락 분석(Context Analysis)을 위한 종합적인 솔루션을 제공하며, 자동 대화 인식(Auto Speech Recognition), 텍스트 요약(Text Summarization), 감정 분석(sentimental Analysis)를 포함하는 과정들을 통해 비디오, 채널 맞춤형 서비스를 구축한다. 나아가 텍스트 처리의 결과물로 추출되는 텍스트 키워드(Text Keywords), 태깅 태스크(Tagging Task), 대중 의견(Public Opinion)을 트렌드 분석 및 매칭 환경 조성에 활용한다.A content search system suitable for advertisements according to an embodiment of the present invention provides a comprehensive solution for context analysis based on title, audio, description, and comments, auto speech recognition, text summary ( Text Summarization) and sentimental analysis are used to build video and channel customized services. Furthermore, text keywords, tagging tasks, and public opinions extracted as a result of text processing are used to analyze trends and create a matching environment.

본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템은 비디오 콘텐츠의 종합적 광고 중심 분석을 위해 비디오 캡션과 텍스트 처리 정보들을 기반으로 중요도에 따른 사용자 정의 인터페이스를 구축할 수 있으며, 비디오 캡셔닝(Video Captioning) 과정을 거쳐 추출된 다중 프레임(Frame) 내 특징 벡터(Feature Vector) 및 텍스트 처리(Text Processing) 과정을 거쳐 추출된 단어 임베딩 값(Word Embedding Value)을 활용해, 행동, 여론, 키워드, 콘텐츠로 나누어 각 섹션(Section) 별로 클러스터링(Clustering)을 진행할 수 있다.The ad-suitable content search system according to an embodiment of the present invention can build a user-defined interface according to importance based on video captions and text processing information for a comprehensive ad-centric analysis of video content, and video captioning (Video Captioning) Using feature vectors in multiple frames extracted through this process and word embedding values extracted through text processing, it is divided into behavior, opinion, keyword, and content. Clustering may be performed for each section.

이러한 과정은 비디오 캡셔닝(Video Captioning)과 텍스트 처리(Text Processing)를 거쳐 AI 모델을 경량화하고, 텍스트 분석을 위한 데이터 마이닝, 출력 데이터 클러스터링, 시각화 순서로 진행될 수 있으며, 이를 위해 입력된 비디오 캡션 및 해당 채널을 바탕으로 텍스트 처리를 위한 NLP, 비디오 캡션 기반의 딥러닝이 활용될 수 있다.This process can be carried out in the order of lightweight AI model through video captioning and text processing, data mining for text analysis, clustering of output data, and visualization. Based on the channel, NLP for text processing and deep learning based on video captions can be used.

이러한 과정에서 비디오, 텍스트 등의 정보들은 필수불가결한 요소이다. 해당 데이터 처리 기능인 CRUD(Create, Read, Update, Delete) 권한을 부여할 수 있는 항목들의 활성화를 접근 경로의 세분화를 통해 UI/UX를 재구성할 수 있다. 또한, 본 발명의 실시예에서는 비디오 관련 데이터들의 재가공을 통한 정보들의 효율적인 관리를 이끌어내기 위해, 접근성이 용이하도록 클라우드를 통해 가상화된 컴퓨터의 리소스들을 온디맨드 형태로 제공하며, SaaS 형태의 정보들을 활용할 수 있다.In this process, information such as video and text is indispensable. UI/UX can be reconstructed through segmentation of the access path to activate the items that can grant CRUD (Create, Read, Update, Delete) authority, which is the corresponding data processing function. In addition, in the embodiment of the present invention, in order to derive efficient management of information through reprocessing of video-related data, virtualized computer resources are provided in an on-demand form through the cloud for easy accessibility, and information in the form of SaaS is utilized. can

도 4는 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 시스템을 구성하는 비디오 캡션부의 구성도이다. 도 1 내지 도 4를 참조하면, 비디오 캡션부(200)는 VGGish 처리부(20)와, I3D 처리부(30)에 의해 분석된 비젼 데이터와 오디오 데이터를 인코더부(210)에 입력하도록 구성될 수 있다.4 is a block diagram of a video caption unit constituting an advertisement suitable content search system according to an embodiment of the present invention. 1 to 4 , the video captioning unit 200 may be configured to input vision data and audio data analyzed by the VGGish processing unit 20 and the I3D processing unit 30 to the encoder unit 210. .

비디오 캡션부(200)는 비젼 데이터와 오디오 데이터를 기초로 멀티-모달 분석을 통해 비젼 인코더 벡터와, 오디오 인코더 벡터를 생성하는 인코더부(210), 및 학습된 자막 키 값들을 기초로 비디오 데이터와 관련된 자막 데이터를 셀프 어텐션 처리하여 자막 어텐션 벡터를 생성하고, 자막 어텐션 벡터와 비젼 인코더 벡터 및 오디오 인코더 벡터를 멀티-모달 어텐션 처리하여 비디오 캡션을 생성하는 디코더부(250)를 포함할 수 있다.The video caption unit 200 includes an encoder unit 210 that generates a vision encoder vector and an audio encoder vector through multi-modal analysis based on vision data and audio data, and video data and video data based on learned caption key values. It may include a decoder unit 250 that generates a caption attention vector by performing self-attention processing on related caption data, and generates video captions by performing multi-modal attention processing on the caption attention vector, vision encoder vector, and audio encoder vector.

인코더부(210)는 학습된 비젼 키 값들을 기초로 비젼 데이터를 셀프 어텐션(self attention) 처리하여 비젼 어텐션 벡터를 생성하는 비젼 셀프 어텐션부(211), 학습된 오디오 키 값들을 기초로 오디오 데이터를 셀프 어텐션 처리하여 오디오 어텐션 벡터를 생성하는 오디오 셀프 어텐션부(212), 비젼 어텐션 벡터 및 오디오 어텐션 벡터를 기초로 멀티-모달 분석을 수행하여 제1 특징 벡터를 생성하는 제1 멀티-모달 어텐션부(213), 비젼 어텐션 벡터 및 오디오 어텐션 벡터를 기초로 멀티-모달 분석을 수행하여 제2 특징 벡터를 생성하는 제2 멀티-모달 어텐션부(214), 제1 멀티-모달 어텐션부(213)에 의해 생성되는 제1 특징 벡터로부터 비젼 인코더 벡터를 생성하는 제1 완전 연결층(fully connected layer)(215), 제2 멀티-모달 어텐션부(214)에 의해 생성되는 제2 특징 벡터로부터 오디오 인코더 벡터를 생성하는 제2 완전 연결층(216)을 포함할 수 있다.The encoder unit 210 includes a vision self-attention unit 211 that generates a vision attention vector by self-attention processing the vision data based on the learned vision key values, and the audio data based on the learned audio key values. An audio self-attention unit 212 that performs self-attention processing to generate an audio attention vector, and a first multi-modal attention unit that generates a first feature vector by performing multi-modal analysis based on the vision attention vector and the audio attention vector ( 213), by the second multi-modal attention unit 214 and the first multi-modal attention unit 213 generating a second feature vector by performing multi-modal analysis based on the vision attention vector and the audio attention vector. An audio encoder vector is generated from the second feature vector generated by the first fully connected layer 215, which generates a vision encoder vector from the generated first feature vector, and the second multi-modal attention unit 214. A second fully connected layer 216 may be formed.

인공지능 모델은 인코더부(210)의 출력 값들을 출력하는 출력부(220, 230)와, 인공지능 모델을 학습하도록 출력부(220, 230)의 출력 값들을 인코더부(210)의 입력단으로 피드백하는 피드백부(240)를 포함할 수 있다.The artificial intelligence model feeds back the output values of the output units 220 and 230 to output values of the encoder unit 210 and the output values of the output units 220 and 230 to the input terminal of the encoder unit 210 to learn the artificial intelligence model. It may include a feedback unit 240 to.

디코더부(250)는 학습된 자막 키 값들을 기초로 비디오 데이터(비디오 콘텐츠)와 관련된 자막 데이터를 셀프 어텐션 처리하여 자막 어텐션 벡터를 생성하는 셀프 어텐션부(251), 자막 어텐션 벡터와 비젼 인코더 벡터 및 오디오 인코더 벡터를 멀티-모달 어텐션 처리하는 멀티모달 어텐션부(252), 멀티-모달 어텐션 처리된 특징 벡터로부터 비디오 캡션을 생성하여 출력하는 완전 연결층(253)을 포함할 수 있다. 비디오 데이터와 관련된 자막 데이터는 캡션부(242)에 의해 획득될 수 있다.The decoder unit 250 includes a self-attention unit 251 generating a caption attention vector by performing self-attention processing on caption data related to video data (video content) based on the learned caption key values, a caption attention vector and a vision encoder vector, and It may include a multimodal attention unit 252 that multi-modal attention-processes the audio encoder vector, and a fully connected layer 253 that generates and outputs a video caption from the multi-modal attention-processed feature vector. Caption data related to video data may be obtained by the caption unit 242 .

도 5는 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법의 순서도이다. 도 5를 참조하면, 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법은 맥락 추출부에 의해, 비디오 콘텐츠를 비디오 캡셔닝을 기반으로 분석하여 시계열 구간별로 비디오 캡션을 생성하여 비디오 콘텐츠의 맥락을 추출하는 단계(S10)와, 광고 매칭부에 의해, 비디오 콘텐츠에서 시계열 구간별로 추출된 맥락을 기반으로 시계열 구간별로 광고 콘텐츠를 매칭하는 단계(S20)를 포함할 수 있다.5 is a flowchart of a method for searching for advertisement suitable content according to an embodiment of the present invention. Referring to FIG. 5 , in the advertisement suitable content search method according to an embodiment of the present invention, the context extractor extracts the context of the video content by analyzing video content based on video captioning and generating video captions for each time series section. It may include a step (S10), and a step (S20) of matching advertisement content for each time-series section based on the context extracted for each time-series section from the video content by the advertisement matching unit.

도 6은 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법을 보다 구체적으로 나타낸 순서도이다. 도 4 및 도 6을 참조하면, 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법은 비디오 콘텐츠에서 객체의 행동 정보, 소리 정보 및 자막 정보를 추출하는 단계(S12), 행동 정보, 소리 정보 및 자막 정보를 기반으로 멀티-모달 분석을 통해 비디오 캡션을 생성하는 단계(S14), 그리고 비디오 캡션을 기반으로 추출된 정보를 기반으로 광고와 비디오 콘텐츠의 적합도를 추산하여 비디오 콘텐츠와 관련된 광고를 매칭하고, 비디오 콘텐츠의 재생 구간 중에 광고가 표출되는 광고 표출 시점을 결정하는 단계(S22)를 포함할 수 있다.6 is a flowchart illustrating in detail a method for searching for content suitable for advertisements according to an embodiment of the present invention. Referring to FIGS. 4 and 6 , the advertisement suitable content search method according to an embodiment of the present invention includes extracting behavior information, sound information, and subtitle information of an object from video content (S12), behavior information, sound information, and subtitle information. Generating a video caption through multi-modal analysis based on the information (S14), and matching an advertisement related to the video content by estimating the suitability of the advertisement and the video content based on the information extracted based on the video caption, Determining an advertisement display time when an advertisement is displayed during a reproduction period of video content (S22) may be included.

본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법은 비디오 콘텐츠를 비젼 데이터와 오디오 데이터로 분할하는 단계와, 비젼 데이터를 기초로 행동 정지점을 설정하여 시계열 구간을 분할하는 단계, 및 인공지능 모델에 의해 시계열 구간별로 비젼 데이터 및 오디오 데이터를 기초로 비젼 모드와 오디오 모드의 멀티-모달 분석을 통해 객체의 행동과 관련된 비디오 캡션을 생성하는 단계를 포함할 수 있다.A content search method suitable for advertisements according to an embodiment of the present invention includes the steps of dividing video content into vision data and audio data, setting action stopping points based on the vision data to divide time series sections, and and generating a video caption related to the motion of the object through multi-modal analysis of the vision mode and the audio mode based on the vision data and audio data for each time series section.

비디오 캡션을 생성하는 단계는 인코더부(210)에 의해 비젼 데이터와 오디오 데이터를 기초로 멀티-모달 분석을 통해 비젼 인코더 벡터와 오디오 인코더 벡터를 생성하는 단계와, 디코더부(250)에 의해, 학습된 자막 키 값들을 기초로 비디오 데이터와 관련된 자막 데이터를 셀프 어텐션 처리하여 자막 어텐션 벡터를 생성하는 단계, 및 디코더부(250)에 의해 자막 어텐션 벡터와 비젼 인코더 벡터 및 오디오 인코더 벡터를 멀티-모달 어텐션 처리하여 비디오 캡션을 생성하는 단계;를 포함할 수 있다.The step of generating the video caption includes generating a vision encoder vector and an audio encoder vector through multi-modal analysis based on the vision data and audio data by the encoder unit 210, and learning by the decoder unit 250. generating a caption attention vector by performing self-attention processing on caption data related to video data based on the obtained caption key values; processing to generate video captions; may include.

비젼 인코더 벡터와, 오디오 인코더 벡터를 생성하는 단계는 비젼 셀프 어텐션부(211)에 의해, 학습된 비젼 키 값들을 기초로 비젼 데이터를 셀프 어텐션 처리하여 비젼 어텐션 벡터를 생성하는 단계와, 오디오 셀프 어텐션부(212)에 의해, 학습된 오디오 키 값들을 기초로 오디오 데이터를 셀프 어텐션 처리하여 오디오 어텐션 벡터를 생성하는 단계, 비젼 어텐션 벡터 및 오디오 어텐션 벡터를 제1 멀티-모달 어텐션부(213)에 입력하여 비젼 인코더 벡터를 생성하는 단계, 및 비젼 어텐션 벡터 및 오디오 어텐션 벡터를 제2 멀티-모달 어텐션부(214)에 입력하여 오디오 인코더 벡터를 생성하는 단계를 포함할 수 있다.The step of generating the vision encoder vector and the audio encoder vector includes the step of self-attention processing the vision data based on the learned vision key values by the vision self-attention unit 211 to generate the vision-attention vector, and the audio self-attention Generating an audio attention vector by performing self-attention processing on the audio data based on the learned audio key values by the unit 212, and inputting the vision attention vector and the audio attention vector to the first multi-modal attention unit 213. generating a vision encoder vector, and generating an audio encoder vector by inputting the vision attention vector and the audio attention vector to the second multi-modal attention unit 214 .

도 7은 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법에 따라 비디오 콘텐츠의 맥락을 추출한 결과를 나타낸 예시도이다. 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법에 의하면, 맥락 분석 결과를 기반으로 해당 비디오 콘텐츠에 적합한 최적의 광고를 선정하여 광고 효과를 극대화할 수 있다.7 is an exemplary diagram illustrating a result of extracting a context of video content according to a method for searching for content suitable for an advertisement according to an embodiment of the present invention. According to the advertisement suitable content search method according to an embodiment of the present invention, the advertisement effect can be maximized by selecting the optimal advertisement suitable for the corresponding video content based on the context analysis result.

도 8은 본 발명의 실시예에 따라 메타 데이터를 분석한 결과를 나타낸 예시도이다. 도 9는 본 발명의 실시예에 따라 광고 성과와 트렌드를 분석한 결과를 나타낸 예시도이다. 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법은 관리부에 의해, 비디오 콘텐츠에 표출되는 광고 로그를 확인하고, 비디오 콘텐츠 별로 광고 표출에 따른 광고 성과를 관리하는 단계를 포함할 수 있으며, SaaS 플랫폼을 통해 광고가 제공된 콘텐츠 정보(광고가 어디에 올라갔는지 정보 제공), 광고 성과와 트렌드를 편리하게 확인할 수 있다.8 is an exemplary view showing the result of analyzing meta data according to an embodiment of the present invention. 9 is an exemplary diagram illustrating results of analyzing advertisement performance and trends according to an embodiment of the present invention. A method for searching for content suitable for advertisement according to an embodiment of the present invention may include, by a management unit, checking an advertisement log displayed in video content, managing advertisement performance according to advertisement display for each video content, and using a SaaS platform. You can conveniently check the content information provided by the advertisement (information on where the advertisement was posted), advertisement performance and trends through the website.

도 10은 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법에 의해 수행된 이미지 캡션 분석 결과를 나타낸 예시도이다. 도 11은 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법에 의해 수행된 비디오 캡션 분석 결과를 나타낸 예시도이다.10 is an exemplary diagram illustrating an image caption analysis result performed by a method for searching for advertisement suitable content according to an embodiment of the present invention. 11 is an exemplary diagram illustrating a video caption analysis result performed by a method for searching for advertisement suitable content according to an embodiment of the present invention.

본 발명의 실시예에 의하면, 시행하고자 하는 광고 정보를 파악한 후, 비디오 캡셔닝(Video Captioning)을 기반으로 영상 등의 콘텐츠에서 맥락을 추출하여 광고에 가장 적합한 콘텐츠를 매칭해주고, 그 결과를 분석할 수 있다. 특히 도 11에 도시된 바와 같이, 비디오 캡셔닝을 통해 동영상 콘텐츠의 시계열 구간(#1, #2, #3, #4) 별로 맥락을 추출하여 각 시계열 구간(#1, #2, #3, #4)에 대해 분석된 맥락에 적합한 광고를 매칭할 수 있다.According to an embodiment of the present invention, after identifying the advertisement information to be implemented, context is extracted from contents such as video based on video captioning to match the most suitable contents to the advertisement, and the result is analyzed. can In particular, as shown in FIG. 11, context is extracted for each time series section (#1, #2, #3, #4) of video content through video captioning, and each time series section (#1, #2, #3, An advertisement suitable for the analyzed context for #4) may be matched.

도 12는 본 발명의 실시예에 따른 광고 적합 콘텐츠 탐색 방법에 의해 수행된 텍스트 처리 결과를 나타낸 예시도이다. 본 발명의 실시예에 의하면, 비디오 캡셔닝을 통해 분석된 비디오 캡션과, 동영상 콘텐츠에 포함된 자막 등을 포함하는 원본 텍스트를 요약하고, 추가로 감정 분석을 통해 텍스트 처리 결과를 생성하여 이를 광고 적합 콘텐츠의 탐색에 활용할 수 있다.12 is an exemplary diagram illustrating text processing results performed by a method for searching for advertisement suitable content according to an embodiment of the present invention. According to an embodiment of the present invention, video captions analyzed through video captioning and original text including subtitles included in video content are summarized, and text processing results are generated through emotion analysis, which is suitable for advertising. It can be used to search for content.

이상에서 설명된 실시예들의 구성 중 적어도 일부는 하드웨어 구성요소, 소프트웨어 구성요소, 및/ 또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(Arithmetic Logic Unit), 디지털 신호 프로세서(Digital Signal Processor), 마이크로컴퓨터, FPGA(Field Programmable Gate Array), PLU(Programmable Logic Unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다.At least some of the configurations of the embodiments described above may be implemented as hardware components, software components, and/or a combination of hardware components and software components. For example, the devices, methods and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate (FPGA) array), programmable logic units (PLUs), microprocessors, or any other device capable of executing and responding to instructions.

처리 장치는 운영 체제 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술 분야에서 통상의 지식을 가진 자는 처리 장치가 복수 개의 처리 요소(Processing Element) 및/또는 복수 유형의 처리요소를 포함할 수 있음을 이해할 것이다.A processing device may run an operating system and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of software. For convenience of understanding, there are cases in which one processing device is used, but those skilled in the art know that a processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It will be understood that it can include

예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(Parallel Processor) 와 같은, 다른 처리 구성(Processing configuration)도 가능하다. 소프트웨어는 컴퓨터 프로그램(Computer Program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다.For example, a processing device may include a plurality of processors or a processor and a controller. Also, other processing configurations are possible, such as a parallel processor. Software may include a computer program, code, instructions, or a combination of one or more of the foregoing, which configures a processing device to operate as desired or processes independently or collectively. You can command the device.

소프트웨어 및/ 또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody) 될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능한 기록 매체에 저장될 수 있다.Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or provide instructions or data to a processing device. can be embodied in Software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. Computer readable media may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the medium may be specially designed and configured for the embodiment or may be known and usable to those skilled in computer software.

컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CDROM, DVD와 같은 광기록 매체(optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CDROMs and DVDs, and ROMs, RAMs, and flash memories. hardware devices specially configured to store and execute program instructions, such as; Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다. 그러므로, 다른 구현들, 다른 실시예들 및 청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.As described above, although the embodiments have been described with limited examples and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved. Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

100 : 광고 적합 콘텐츠 탐색 시스템
110 : 맥락 분석부
112 : 비디오 캡션부
114 : 텍스트 처리부
116 : 정보 추출부
117 : 맥락 추출부
118 : 광고 매칭부
120 : 자동화 파이프라인
122 : MLOps
130 : SaaS 플랫폼
132 : Action REST API
134 : 사용자 인터페이스부
200 : 비디오 캡션부
210 : 인코더부
211 : 비젼 셀프 어텐션부
212 : 오디오 셀프 어텐션부
213 : 제1 멀티-모달 어텐션부
214 : 제2 멀티-모달 어텐션부
215 : 제1 완전 연결층
216 : 제2 완전 연결층
220, 230 : 출력부
240 : 피드백부
250 : 디코더부
251 : 셀프 어텐션부
252 : 멀티모달 어텐션부
253 : 완전 연결층
310 : 분석부
320 : 영상 분석 AI 서버
330 : 판단부
340 : 관리부
350 : SaaS 플랫폼 100: Advertisement suitable content search system
110: context analysis unit
112: video caption unit
114: text processing unit
116: information extraction unit
117: context extraction unit
118: advertisement matching unit
120: automation pipeline
122: MLOps
130: SaaS platform
132 : Action REST API
134: user interface unit
200: video caption unit
210: encoder unit
211: vision self-attention part
212: audio self-attention unit
213: first multi-modal attention unit
214: second multi-modal attention unit
215: first fully connected layer
216: second fully connected layer
220, 230: output unit
240: feedback unit
250: decoder unit
251: self-attention unit
252: multimodal attention unit
253: fully connected layer
310: analysis unit
320: video analysis AI server
330: judgment unit
340: management department
350: SaaS platform

Claims

a context extractor configured to extract a context of the video content by analyzing video content based on video captioning and generating video captions for each time series section; and
An advertisement suitable content search system comprising: an advertisement matching unit that matches advertisement content for each time series section based on the context extracted for each time series section from the video content.

The method of claim 1,
An information extraction unit for extracting object behavior information, sound information, and subtitle information from the video content;
and a video captioning unit configured to generate the video caption through multi-modal analysis based on the behavior information, the sound information, and the subtitle information.

The method of claim 2,
The video caption part:
divide the video content into vision data and audio data;
dividing the time-series section by setting an action stopping point based on the vision data; and
Search for advertisement suitable content configured to generate the video caption related to the behavior of the object through multi-modal analysis of a vision mode and an audio mode based on the vision data and the audio data for each time series section by an artificial intelligence model. system.

The method of claim 3,
The video caption part:
an encoder unit generating a vision encoder vector and an audio encoder vector through multi-modal analysis based on the vision data and the audio data; and
Based on the learned caption key values, caption data related to the video data is subjected to self-attention processing to generate a caption attention vector, and multi-modal attention processing is performed on the caption attention vector, the vision encoder vector, and the audio encoder vector to generate the video data. A decoder unit that generates captions; including, an ad-suitable content search system.

The method of claim 4,
The encoder unit:
a vision self-attention unit generating a vision attention vector by performing self-attention processing on the vision data based on learned vision key values;
an audio self-attention unit generating an audio attention vector by performing self-attention processing on the audio data based on learned audio key values;
a first multi-modal attention unit generating a first feature vector by performing multi-modal analysis based on the vision attention vector and the audio attention vector;
a second multi-modal attention unit generating a second feature vector by performing multi-modal analysis based on the vision attention vector and the audio attention vector;
a first fully connected layer generating a vision encoder vector from the first feature vector generated by the first multi-modal attention unit; and
and a second fully connected layer generating the audio encoder vector from the second feature vector generated by the second multi-modal attention unit.

The method of claim 1,
The advertisement matching unit:
matching an advertisement related to the video content by estimating a degree of suitability between the advertisement and the video content based on information extracted based on the video caption;
An advertisement suitable content search system for determining an advertisement display time when the advertisement is displayed during a reproduction period of the video content.

The method of claim 1,
A management unit that checks an advertisement log displayed in video content and manages advertisement performance according to advertisement display for each video content.

extracting the context of the video content by analyzing the video content based on the video captioning and generating a video caption for each section in the time series by a context extraction unit; and
A method of searching for advertisement suitable content, comprising: matching advertisement content for each time-series section based on the context extracted for each time-series section from the video content by an advertisement matching unit.

The method of claim 8,
Extracting, by an information extractor, behavior information, sound information, and subtitle information of an object from the video content;
The extracting of the context may include generating the video caption through multi-modal analysis based on the behavior information, the sound information, and the subtitle information.

The method of claim 9,
The step of generating the video caption is:
dividing the video content into vision data and audio data;
dividing the time-series section by setting an action stop point based on the vision data; and
Generating the video caption related to the behavior of the object through multi-modal analysis of a vision mode and an audio mode based on the vision data and the audio data for each time series section by an artificial intelligence model; How to discover eligible content.

The method of claim 10,
The step of generating the video caption is:
generating, by an encoder unit, a vision encoder vector and an audio encoder vector through multi-modal analysis based on the vision data and the audio data;
generating a caption attention vector by performing self-attention processing on caption data related to the video data based on learned caption key values, by a decoder unit; and
and generating the video caption by performing multi-modal attention processing on the caption attention vector, the vision encoder vector, and the audio encoder vector by the decoder unit.

The method of claim 11,
Generating the vision encoder vector and the audio encoder vector:
generating a vision attention vector by performing self-attention processing on the vision data based on learned vision key values, by a vision self-attention unit;
generating an audio attention vector by performing self-attention processing on the audio data based on learned audio key values, by an audio self-attention unit;
generating the vision encoder vector by inputting the vision attention vector and the audio attention vector to a first multi-modal attention unit; and
and generating the audio encoder vector by inputting the vision attention vector and the audio attention vector to a second multi-modal attention unit.

The method of claim 8,
The step of matching the advertising content is:
matching an advertisement related to the video content by estimating a degree of suitability between the advertisement and the video content based on information extracted based on the video caption; and
A method of searching for content suitable for advertisements, comprising: determining an advertisement presentation time when the advertisement is displayed during a reproduction period of the video content.

The method of claim 8,
A method of searching for advertisement suitable content, further comprising: checking, by a management unit, advertisement logs displayed in video contents, and managing advertisement performance according to advertisement presentation for each video content.

A computer program recorded on a computer-readable recording medium to execute the advertisement suitable content search method according to any one of claims 8 to 14.