KR101389730B1

KR101389730B1 - Method to create split position accordance with subjects for the video file

Info

Publication number: KR101389730B1
Application number: KR1020130065651A
Authority: KR
Inventors: 나준규
Original assignee: 스카이커뮤니티주식회사
Priority date: 2013-06-10
Filing date: 2013-06-10
Publication date: 2014-04-28

Abstract

The present invention relates to a method for creating split positions, which is capable of splitting sections according to subjects of video content by analyzing a sound source and a video of a video file. The method of the present invention comprises the steps of: (A) extracting a sound source from a file to be analyzed; (B) dividing the sound source into set time intervals and extracting words included in the sound source; (C) calculating an accumulated value of the extracted words and analyzing distribution thereof; and (D) creating split positions of the file to be analyzed according to a rate of change in the accumulated value of the accumulated words. According to the present invention, reproduction sections can be split by matching keywords extracted from a sound source and video of a video file, with respect to the video file including various subjects, thereby making clear a subject of each split video. [Reference numerals] (AA) Start; (BB) No; (CC) Yes; (DD) End; (S110) Execute a file to be analyzed; (S210) Extract a sound source; (S220) Extract words; (S230) Correct the words; (S240) Accumulate and store the extracted words; (S250) Analyze distribution of representative words; (S310) Extract a video; (S320) Recognize and extract a text; (S330) Degree of a change in an image >= Setting value; (S340) Store a video position and an important word; (S410) Split a sector

Description

{METHOD TO CREATE SPLIT POSITION ACCORDANCE WITH SUBJECTS FOR THE VIDEO FILE}

본 발명은 동영상 파일의 주제별 분할 위치 생성 방법에 관한 것으로, 더욱 상세하게는 동영상 파일의 음원과 영상을 분석하여 동영상 내용의 주제별로 구간을 분할하는 분할 위치 생성 방법에 관한 것이다.
The present invention relates to a method for generating a split position for each subject of a video file, and more particularly, to a method for generating a split position for analyzing a sound source and an image of a video file and dividing a section for each subject of a video content.

최근에는, 인터넷 기술의 발달로 인해 대용량 파일전송이나 고화질 동영상의 실시간 스트리밍을 통해 온라인을 이용한 서비스가 확대되고 있다. 또한, 동영상 데이터의 종류 또한 방대해져 사용자의 입장에서 사용자가 원하는 특정부분에 대하여 맞춤형으로 동영상 데이터를 선별하여 제공받고자 하는 욕구가 증가되고 있다.Recently, due to the development of Internet technology, services using online are expanding through large-scale file transmission or real-time streaming of high-definition video. In addition, the type of video data is also enormous, and the desire to select and provide video data customized to a specific portion desired by the user is increasing from the user's point of view.

대표적으로 동영상 강의 서비스는 학습자가 직접 강연장을 찾아 수강할 필요가 없기 때문에 시간과 장소에 제약받지 않고 효과적으로 학습할 수 있는 장점이 있어, 인터넷상에서 수많은 동영상 강의가 제공되고 있지만, 일반적으로 동영상 강의는 약 50분 이상으로 제작되어 학습자의 집중력을 저하하고, 상기 학습자가 수강하고자 하는 내용을 신속/정확하게 검색하지 못하는 문제점이 있다. Typically, video lecture service has the advantage that learners can learn effectively regardless of time and place because they do not need to find and take lectures directly. However, video lectures are generally provided on the Internet. Produced in about 50 minutes or more, the concentration of the learner is lowered, there is a problem that the learner can not quickly / accurately search the content to take.

또한, 휴대용 디지털 기기의 발달로 이동 중 짧은 시간 동안 동영상 강의를 수강하는 학습자가 증가함에 따라 주제별로 분할한 짧은 시간의 동영상 강의의 필요성이 증대되고 있다.In addition, with the development of portable digital devices, as the number of learners who take video lectures for a short time while moving, the necessity of short video lectures divided by subjects is increasing.

한편, 상기 학습자의 욕구를 충족시키기 위하여 대한민국 등록특허 10-0997599호에서는 동영상을 복수의 장면으로 분할하고, 상기 분할된 장면마다 메타데이터를 생성하는 기술이 개발되고 있다. 구체적으로는, 도 1에 도시된 바와 같이, 분할된 장면마다 개시 위치와 종료 위치를 나타내는 장면의 구간 정보 메타데이터를 생성하는 장면 분할부, 상기 장면 분할부로부터 장면의 구간 정보 메타데이터에 근거하여, 상기 동영상의 프레임간 화소 차분, 색 또는 휘도의 히스토그램 차분 등을 통해 자동으로 장면 변경점을 검출하는 장면 변경 검출부 등으로 구성된 시스템을 제공한다.Meanwhile, in order to satisfy the needs of the learners, Korean Patent No. 10-0997599 discloses a technology for dividing a video into a plurality of scenes and generating metadata for each of the divided scenes. Specifically, as shown in FIG. 1, a scene divider for generating section information metadata of a scene indicating a start position and an end position for each divided scene, based on the section information metadata of the scene from the scene divider. And a scene change detector for automatically detecting a scene change point through a pixel difference between frames of the video, a histogram difference of color or luminance, and the like.

그러나, 상기 종래기술을 통해 분할된 장면에 포함되는 캡션을 해석하여 자동으로 주제(텍스트)를 부여하는 경우, 상기 분할된 장면과 매칭된 주제의 정확도가 낮아질 수 있다. 이와 같은 경우, 사용자가 직접 분할된 장면을 분석하여 주제를 부여함으로 정확도를 향상시킬 수 있지만, 상기 사용자가 분할된 장면을 모두 확인함에 있어 시간적 손실이 유발되는 문제점이 있다.
However, when the caption included in the divided scene is automatically assigned through the conventional technology and a subject (text) is automatically assigned, the accuracy of the subject matched with the divided scene may be lowered. In this case, although the user can improve the accuracy by analyzing the divided scene directly and give a theme, there is a problem that the user loses time in checking all the divided scenes.

(0001) 대한민국 등록특허 10-0997599호(0001) Republic of Korea Patent Registration 10-0997599

본 발명은 상기와 같은 종래의 문제점을 해결하기 위하여 안출된 것으로, 상기 동영상 파일을 구성하는 음원과 영상으로부터 각각 키워드와 변화도를 추출/매칭하여 재생구간을 분할할 수 있는 동영상 파일의 주제별 분할 위치 생성 방법을 제공하고자 하는 것이다.
The present invention has been made to solve the above-mentioned problems, and the subject division position of the video file which can divide the playback section by extracting / matching the keywords and the degree of change from the sound source and the video constituting the video file, respectively. We want to provide a way to create it.

상기한 바와 같은 목적을 달성하기 위한 본 발명의 특징에 따르면 본 발명에 의한 동영상 파일의 주제별 분할 위치 생성 방법은 (A) 분석 대상 파일로부터 음원을 추출하는 단계와; (B) 상기 음원을 설정한 시간간격으로 구분하여 상기 음원에 포함된 단어들을 추출하는 단계와; (C) 상기 추출된 단어들의 누적치를 산출하여 이에 대한 분포를 분석하는 단계; 그리고 (D) 상기 누적된 단어들을 누적치 변화율에 따라 상기 분석 대상 파일의 분할위치를 생성하는 단계를 포함하여 수행된다.According to a feature of the present invention for achieving the above object, the method for generating a divided position for each subject of a video file according to the present invention comprises the steps of: (A) extracting a sound source from the analysis target file; (B) extracting words included in the sound source by dividing the sound source by a predetermined time interval; (C) calculating a cumulative value of the extracted words and analyzing the distribution thereof; And (D) generating a split position of the analysis target file according to the cumulative value change rate of the accumulated words.

이때, 상기 (B)단계는, 추출된 단어를 발음 변이에 따른 정형화를 통해 오류 인식에 대한 보정과정을 포함하여 수행될 수도 있다.In this case, step (B) may be performed by correcting an error recognition through shaping the extracted words according to pronunciation variations.

그리고 본 발명은 (a) 분석 대상 파일로부터 음원을 추출하는 단계와; (b) 상기 음원을 설정한 시간간격으로 구분하여 상기 음원에 포함된 단어들을 추출하는 단계와; (c) 상기 추출된 단어들의 누적치를 산출하여 이에 대한 분포를 분석하는 단계와; (d) 상기 분석 대상 파일로부터 영상을 추출하는 단계와; (e) 상기 영상을 인식하여 텍스트를 추출하는 단계와; (f) 상기 영상의 이미지 변화도와 설정값을 비교하는 단계와; (g) 상기 (f)단계의 비교결과, 상기 영상의 이미지 변화도가 설정값 이상인 경우, 영상위치를 저장하고, 상기 영상위치 이전 구간에서 추출된 텍스트를 저장하는 단계; 그리고 (h) 상기 (c)단계의 추출 단어의 분포 변화와 상기 (g)단계의 저장된 영상위치 및 텍스트를 기준으로 상기 분석 대상 파일의 분할위치를 생성하는 단계를 포함하여 수행되는 동영상 파일의 주제별 분할 위치 생성 방법을 포함한다.And the present invention (a) extracting the sound source from the analysis target file; (b) extracting words included in the sound source by dividing the sound source by a predetermined time interval; (c) calculating a cumulative value of the extracted words and analyzing a distribution thereof; (d) extracting an image from the analysis target file; (e) recognizing the image and extracting text; (f) comparing image variation and setting values of the image; (g) storing the image position and storing the text extracted from the section before the image position when the image change degree of the image is equal to or larger than a set value as a result of the comparison in the step (f); And (h) generating a split position of the analysis target file based on the distribution change of the extracted word of step (c) and the stored image position and text of step (g). And a method of generating a split position.

이때, 상기 (c) 단계의 추출 단어의 분포 분석은, 상기 추출된 단어들의 추출 빈도에 따라 대표단어를 설정하고, 상기 대표단의 재생시간별 추출 횟수의 분포를 분석하는 것일 수도 있다.In this case, the distribution analysis of the extracted words in the step (c) may be to set the representative word according to the extraction frequency of the extracted words, and to analyze the distribution of the number of extraction for each play time of the representative word.

그리고 상기 이미지 변화도는 상기 영상 중 기준 이미지의 변화 비율을 수치로 나타낸 것이고, 상기 설정값은 상기 이미지 변화도에 따른 영상의 내용 전환 여부를 판별하기 위한 기준 값일 수도 있다.The image change degree may be a numerical value representing a change rate of the reference image in the image, and the set value may be a reference value for determining whether to change the content of the image according to the image change degree.

한편, 상기 (g) 단계의 텍스트 저장은, 상기 추출된 텍스트들 중 상기 구간 내에서 반복 횟수를 기준으로 선정한 요지단어를 선정하여 저장하는 것일 수도 있다.On the other hand, the text storage of step (g) may be to select and store the key word selected based on the number of repetitions within the section of the extracted text.

그리고 상기 제 (h) 단계의 분할 위치의 선정은, (h1) 상기 음원으로부터 추출된 대표단어의 시간별 변화에 따라 대표단어의 변화 위치를 산출하는 단계와; (h2) 상기 대표단어와 상기 영상으로부터 추출된 요지단어를 비교하는 단계; 그리고 (h3) 상기 대표단어와 상기 요지단어가 일치하는 경우, 상기 영상의 저장위치를 분할위치로 설정하고, 해당 구간의 주제를 상기 대표단어로 표시하는 테그를 설정하는 단계를 포함하여 수행될 수도 있다.The selection of the division position of the (h) step may include: (h1) calculating a change position of the representative word according to a time-dependent change of the representative word extracted from the sound source; (h2) comparing the representative word with a key word extracted from the image; And (h3) when the representative word and the main word coincide with each other, setting the storage position of the image as a split position and setting a tag displaying a subject of a corresponding section as the representative word. have.

여기서 상기 대표단어와 상기 요지단어가 일치하지 않는 경우, 상기 (c) 단계에 의해 추출된 단어들 중 어느 하나로 상기 대표단어를 변경하여 상기 제 (h2) 단계를 수행할 수도 있다.In this case, when the representative word and the main word do not match, the step (h2) may be performed by changing the representative word into any one of the words extracted by the step (c).

그리고 상기 대표단어와 상기 요지단어가 일치하지 않는 경우, 상기 (g) 단계에 의해 추출된 텍스트들 중 단어들 중 어느 하나로 상기 요지단어를 변경하여 상기 제 (h2) 단계를 수행할 수도 있다.
When the representative word and the main word do not coincide with each other, the main word may be changed to any one of words among the texts extracted by the step (g) to perform the step (h2).

상기에서 살핀 바와 같이, 본 발명에 의한 동영상 파일의 주제별 분할 위치 생성 방법은 다음과 같은 효과를 기대할 수 있다.As described above, the method for generating a divided position for each subject of a video file according to the present invention can expect the following effects.

본 발명에서는 다양한 주제를 포함한 동영상 파일에 대하여, 상기 동영상 파일의 음원과 영상으로부터 추출된 키워드의 매칭을 통해 재생구간을 분할하므로, 각 분할된 영상에 대하여 주제를 명확히 할 수 있는 장점이 있다.According to the present invention, since a playback section is divided by matching keywords extracted from a sound source and an image of the video file including various subjects, there is an advantage of clarifying a subject for each divided image.

또한, 본 발명은 각 분할된 영상에 대하여 주제를 명확히 할 수 있으므로, 이용자가 시청하고자 하는 주제에 대한 동영상을 적시에 제공받을 수 있는 장점이 있다.
In addition, the present invention can clarify the subject for each divided image, there is an advantage that the user can be provided in a timely manner a video on the subject to watch.

도 1은 종래 기술에 의한 동영상의 장면 변경점을 검출하는 편집장치의 구성을 도시한 블록도.
도 2는 본 발명에 의한 동영상 파일의 주제별 분할 위치 생성 방법의 구체적인 실시예를 도시한 흐름도.
도 3은 본 발명에 의한 동영상 파일의 음원에서 추출한 단어의 누적결과를 도시한 예시도.
도 4는 본 발명에 의한 동영상 파일의 재생구간에 따른 대표단어의 분포를 그래프로 도시한 예시도.
도 5는 본 발명에 의한 동영상 파일의 영상에 대한 이미지 변화도에 따른 영상위치와 요지단어를 그래프로 도시한 예시도.
도 6은 본 발명에 의한 동영상 파일을 주제에 따라 분할하기 위해 도 4와 도 5를 오버랩핑하여 도시한 예시도.
도 7은 본 발명에 의한 동영상 파일의 주제별 분할 위치 생성 방법의 구체적인 실시예를 통해 파일을 섹터 분할한 개념도. BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a block diagram showing the configuration of an editing apparatus for detecting a scene change point of a moving picture according to the prior art.
2 is a flowchart illustrating a specific embodiment of a method for generating a divided position for each subject of a video file according to the present invention.
3 is an exemplary view showing a cumulative result of words extracted from a sound source of a video file according to the present invention.
Figure 4 is an exemplary diagram showing the distribution of the representative word in accordance with the playback section of the video file according to the present invention.
Figure 5 is an exemplary diagram showing the image position and the main word in a graph according to the image change degree for the image of the video file according to the present invention.
FIG. 6 is a diagram illustrating overlapping FIGS. 4 and 5 to divide a video file according to a subject according to the present invention. FIG.
7 is a conceptual diagram illustrating sector division of a file according to a specific embodiment of the method for generating a segmentation location for each subject of a video file according to the present invention.

본 발명은 동영상 파일을 주제에 따라 섹터로 분할하는 분할 위치 생성 방법에 관한 것으로, 상기 동영상은 형식에 따라 AVI(Audio Video Interleaved), MPEG(Moving Picture Experts Group), ASF(Advanced Streaming Format), WMV(Window Media Video)등의 포맷으로 구성된 파일이 적용될 수 있다. 또한, 상기 파일은 제작 목적에 따라 강의, 뉴스, 광고 등이 있을 수 있으나, 본 발명의 실시예에서는 설명의 편의를 위해 주제별 구성이 뚜렷한 강의파일을 예로 들어 설명한다.The present invention relates to a method of generating a split position for dividing a video file into sectors according to a subject. The video includes audio video interleaved (AVI), moving picture experts group (MPEG), advanced streaming format (ASF), and WMV according to a format. A file configured in a format such as (Window Media Video) may be applied. In addition, the file may include lectures, news, advertisements, etc. according to the purpose of production, but in the exemplary embodiment of the present invention, a lecture file having a distinctive composition is described as an example for convenience of description.

그리고, 본 발명에 있어, 상기 파일의 음원 및 영상 추출, 단어추출, 텍스트추출, 섹터 분할생성 등은 전용 프로그램에 의해 이루어지는 것으로, 상기 전용 프로그램은 하드웨어에 설치되고, 상기 하드웨어의 다양한 제어수단 및 저장수단 등에 의해 동작할 수 있다. In the present invention, the sound source and image extraction, the word extraction, the text extraction, the sector division generation, etc. of the file are performed by a dedicated program. The dedicated program is installed in hardware, and various control means and storage of the hardware are provided. It can operate by a means.

이하에서는 상기한 바와 같은 동영상 파일의 주제별 분할 위치 생성 방법의 구체적인 실시예를 첨부된 도면을 참고하여 설명한다.Hereinafter, with reference to the accompanying drawings, a specific embodiment of the method for generating a divided position for each subject of a video file as described above will be described.

도 2는 본 발명에 의한 동영상 파일의 주제별 분할 위치 생성 방법의 구체적인 실시예를 도시한 흐름도이고, 도 3은 본 발명에 의한 동영상 파일의 음원에서 추출한 단어의 누적결과를 도시한 예시도이며, 도 4는 본 발명에 의한 동영상 파일의 재생구간에 따른 대표단어의 분포를 그래프로 도시한 예시도이고, 도 5는 본 발명에 의한 동영상 파일의 영상에 대한 이미지 변화도에 따른 영상위치와 요지단어를 그래프로 도시한 예시도이며, 도 6은 본 발명에 의한 동영상 파일을 주제에 따라 분할하기 위해 도 4와 도 5를 오버랩핑하여 도시한 예시도이고, 도 7은 본 발명에 의한 동영상 파일의 주제별 분할 위치 생성 방법의 구체적인 실시예를 통해 파일을 섹터 분할한 개념도이다.2 is a flowchart illustrating a specific embodiment of a method for generating a segmented position for each subject of a video file according to the present invention, and FIG. 3 is an exemplary diagram showing a cumulative result of words extracted from a sound source of a video file according to the present invention. 4 is an exemplary diagram showing the distribution of the representative words according to the playback section of the video file according to the present invention, Figure 5 is a video position and the main word according to the image change degree for the image of the video file according to the present invention 6 is an exemplary diagram illustrated in a graph, and FIG. 6 is an exemplary diagram overlapping FIGS. 4 and 5 to divide a video file according to the present invention according to a subject, and FIG. According to a specific embodiment of the method of generating a partition position, a conceptual diagram of sector division of a file is provided.

먼저, 도 2에 도시된 바와 같이, 본 발명에 의한 동영상 파일의 주제별 분할 위치 생성 방법은 분석 대상 파일을 실행하는 것으로 시작된다(S110). 이때, 상기 파일은 디지털화된 음원 및 영상 등으로 구성된 멀티미디어 콘텐츠(multimedia contents)로, 광대역통신망이나 고속 데이터망을 통해 송수신이 가능하고, 디지털기기(컴퓨터, 노트북, 스마트폰, PDA 등)에서 재생 가능한 것을 말한다.First, as shown in FIG. 2, the method for generating a divided position for each subject of a video file according to the present invention begins with executing an analysis target file (S110). In this case, the file is a multimedia contents consisting of digitized sound source and video, and can be transmitted and received through a broadband communication network or a high-speed data network, and can be played on a digital device (computer, notebook, smartphone, PDA, etc.). Say that.

다음으로, 제어부는 상기 분석 대상 파일로부터 음원을 추출한다(S210). 이때, 상기 음원은 전용 프로그램에 의해 추출되어 저장부로 저장된다.Next, the controller extracts a sound source from the analysis target file (S210). At this time, the sound source is extracted by a dedicated program and stored in the storage unit.

다음으로, 상기 제 210 단계에서 추출된 음원에서 단어를 추출한다(S220). 이때, 상기 제어부는 설정한 시간간격으로 상기 음원에서 단어를 추출하여 저장부로 저장한다.Next, a word is extracted from the sound source extracted in step 210 (S220). In this case, the controller extracts a word from the sound source at a set time interval and stores the word in the storage.

여기서, 상기 단어추출은 종래 공지된 다양한 방법에 의해 수행될 수 있으며, 예를 들어, 음성을 일종의 패턴으로 간주하여 데이터베이스에 등록되어 있는 패턴과 입력되는 패턴과의 유사도를 측정하여 추출할 수도 있고, 음성이 발성되는 과정을 모델링하고 각 대상 단어 또는 음소마다 고유의 모델을 할당하여 입력되는 음성이 어떤 음성모델로부터 발생될 확률이 가장 높은지 측정하여 추출할 수도 있다.Here, the word extraction may be performed by a variety of methods known in the art, for example, by considering the similarity between the pattern registered in the database and the input pattern by considering the speech as a kind of pattern, The speech may be modeled, and a unique model may be assigned to each target word or phoneme to determine and extract from which speech model the input voice has the highest probability.

다음으로, 상기 추출단어에 대한 보정을 수행한다(S230). 상기 보정은 대화체 음성에서 나타나는 각종 발음 변이에 따라 추출된 단어를 정형화하는 것이다. 즉, 상기 발음 변이들에 대한 정보를 데이터베이스로 구축하고, 이를 통해 자음 및 모음의 보정을 수행하여 추출된 단어를 정형화할 수도 있고, 언어모델 데이터베이스를 구축하여 가장 확률이 높은 단어를 매칭하는 과정을 수행하여 추출된 단어를 정형화할 수도 있다.Next, the correction for the extracted word is performed (S230). The correction is to format the extracted words according to various pronunciation variations appearing in the dialogue voice. That is, the information on the pronunciation variations may be constructed as a database, and through this, correction of consonants and vowels may be used to formalize extracted words, and a language model database may be used to match the most likely words. You can also format the extracted words by doing.

다음으로, 상기 보정된 추출단어의 누적치를 저장한다(S240). 이때, 상기 누적치는 추출단어의 출현빈도를 수치로 나타낸 것으로, 본 발명의 실시예에서는 상기 추출단어의 시계열적 분석을 위해 설정된 시간간격으로 누적치가 산출된다. Next, the cumulative value of the corrected extracted words is stored (S240). In this case, the cumulative value represents the frequency of occurrence of the extracted word as a numerical value. In the embodiment of the present invention, the cumulative value is calculated at a time interval set for time series analysis of the extracted word.

여기서, 상기 제 240 단계에 대한 결과를 도 3을 예로 들어 설명하면, 상기 파일의 시작되는 시점부터 종료되는 시점까지의 재생구간을 '30s'간격으로 분할하고, 각 재생구간에 추출단어들과 이에 대한 누적치가 저장된 경우, 상기 파일의 재생구간 '0~30s'에서 추출된 단어들에 대한 누적치는 'A-6(추출단어-누적치), B-2, C-1, D-0, E-0, F-1,…' 가 저장되고, 재생구간 '30s~60s'에서 추출된 단어들에 대한 누적치는 'A-3, B-1, C-0, D-0, E-2, F-1,…'가 저장된다. 이와 같은 방법으로 각 재생구간에 대한 추출단어들의 누적치가 저장될 수 있다.Herein, referring to FIG. 3, the results of operation 240 are divided into 30s intervals of the playback period from the start point of the file to the end point of the file. If the cumulative values are stored, the cumulative values for the words extracted from the playback section '0 to 30s' of the file are' A-6 (Extracted word-cumulative value), B-2, C-1, D-0, E- 0, F-1,... 'Is stored, and the cumulative values for words extracted from the playback section' 30s ~ 60s' are 'A-3, B-1, C-0, D-0, E-2, F-1, ...'. 'Is stored. In this way, the cumulative value of the extracted words for each playback section can be stored.

다음으로, 대표단어에 대한 분포분석을 수행한다(S250). 이때, 상기 대표단어는 상기 파일 내용의 주제를 함축하고 있는 것으로, 상기 추출단어의 누적치가 높은 순으로 정렬하여 상대적 순위를 통해 대표단어를 설정할 수도 있고, 상기 제어부에서 누적치에 대한 절대값을 반영하여 상기 누적치가 절대값 이상인 추출단어들을 대표단어로 설정할 수도 있다.Next, the distribution analysis for the representative word is performed (S250). In this case, the representative word implies the subject of the contents of the file, the cumulative value of the extracted words may be sorted in ascending order, and the representative word may be set through a relative ranking, and the controller may reflect the absolute value of the cumulative value. Extracted words whose cumulative value is greater than or equal to an absolute value may be set as representative words.

여기서, 상기 분포분석은 대표단어에 대한 누적분포의 시계열적 변화를 나타낸 것으로, 이를 통해 횡축을 시간, 종축을 누적치로 하는 그래프를 생성할 수 있다.Here, the distribution analysis shows a time series change of the cumulative distribution of the representative word, through which a graph with a horizontal axis as a time and a vertical axis as a cumulative value can be generated.

예를 들어, 도 4를 참조하여 상기 분포분석을 통해 생성된 그래프를 설명하면, 상기 설정된 대표단어가 'A', 'G', 'X'이고, 각 대표단어에 대한 시간별 누적치가 도 4에 도시된 바와 같이 분포된 경우, 분포분석결과는 재생구간의 전반부에서 대표단어 'A'의 누적치가 높고, 중반부에서 대표단어 'G'의 누적치가 높으며, 후반부에서 대표단어 'X'의 누적치가 높은 것으로 나타난다. 이를 통해, 상기 파일의 재생구간 중 어느 특정구간의 주제가 무엇인지 파악할 수 있다.For example, referring to FIG. 4, a graph generated through the distribution analysis is described. The set representative words are 'A', 'G', and 'X', and the cumulative values for each representative word in FIG. As shown in the figure, the distribution analysis result shows that the cumulative value of the representative word 'A' is high in the first half, the cumulative value of the representative word 'G' is high in the middle, and the cumulative value of the representative word 'X' is high in the second half. Appears. Through this, it is possible to grasp the subject of any particular section of the playback section of the file.

다음으로, 제어부는 상기 분석 대상 파일로부터 영상을 추출한다(S310). 이때, 상기 영상은 전용 프로그램에 의해 추출되어 저장부로 저장된다. Next, the controller extracts an image from the analysis target file (S310). At this time, the image is extracted by a dedicated program and stored in the storage unit.

그리고, 상기 추출된 영상에서 텍스트를 인식하여 추출한다(S320). 이때, 상기 제어부는 추출한 텍스트를 저장부로 저장한다. 여기서, 상기 텍스트의 인식 및 추출은 공지된 다양한 기술에 의해 수행될 수 있으며, 예를 들어, 다른 부분과 구분되는 특징적인 공간주파수를 통해 수직 또는 수평 방향의 에지 픽셀 빈도수와 푸리에 변환에 의한 기본 주파수의 특징을 이용하여 추출할 수도 있고, 배경과 대조를 이루고, 상기 영상의 동일 위치에 일정 시간 이상 머무르는 특징을 통해 산출한 계수 값을 이용하여 추출할 수도 있다. The text is recognized and extracted from the extracted image (S320). In this case, the controller stores the extracted text as a storage unit. Here, the recognition and extraction of the text may be performed by various known techniques, and for example, the edge pixel frequency in the vertical or horizontal direction and the fundamental frequency by Fourier transform through a characteristic spatial frequency distinguished from other parts. It may be extracted using a feature of, or may be extracted using a coefficient value calculated by a feature that contrasts with the background and stays at the same position of the image for a predetermined time or more.

다음으로, 상기 영상의 이미지 변화도와 설정값을 비교한다(S330). 즉, 구체적으로는 판서를 통해 강사가 강의를 하는 영상인 경우, 판서의 양에 따른 칠판에 대한 이미지 점유율의 변화를 통해 상기 강의 영상을 분할할 수 있다. 이때, 사람의 영상으로 인해 칠판의 이미지 점유율에 영향을 미치는 경우, 사람의 영상을 인식하여 추출한 값을 상기 이미지 점유율에 반영하는 공지된 기술을 통해 상기 칠판의 이미지 점유율을 산출할 수 있다. Next, the image gradient and the set value of the image is compared (S330). That is, in the case where the lecturer lectures through the writing, the lecture image may be divided by changing the image occupancy rate of the blackboard according to the amount of writing. In this case, when the image of the person affects the image occupancy of the blackboard, the image occupancy of the blackboard may be calculated through a known technique in which a value obtained by recognizing a human image is reflected in the image occupancy.

이때, 상기 이미지 변화도는 제어부에서 설정한 이미지가 영상 프레임에서 차지하는 비율을 수치로 나타낸 것으로, 상기 강의 영상의 경우, 상기 제어부가 설정한 이미지는 칠판이 된다.In this case, the degree of change of the image represents the ratio of the image set by the controller to the image frame as a numerical value. In the case of the lecture image, the image set by the controller becomes a blackboard.

여기서, 상기 설정값은 상기 이미지 변화도에 따른 영상의 내용 전환에 대한 변경점의 위치를 판별하는 수치를 나타낸 것으로, 상기 강의 영상에서 상기 칠판의 판서가 모두 삭제된 경우일 수 있다.Here, the set value represents a numerical value for determining the position of the change point for changing the content of the image according to the degree of change of image, and may be a case where all the writings of the blackboard are deleted from the lecture image.

이때, 상기 제 330 단계의 비교결과, 상기 영상의 이미지 변화도가 설정값 이상인 경우, 영상위치 및 요지단어를 저장한다(S340). 여기서, 상기 영상위치는 파일의 재생구간에서 이미지가 변화되는 지점을 나타내는 것으로, 상기 동영상 파일의 주제별 구간 분할의 기준이 된다.In this case, when the image change degree of the image is equal to or greater than a predetermined value as a result of the comparison in step 330, the image position and the key word are stored (S340). Here, the video position indicates a point where the image is changed in the playback section of the file, and serves as a criterion for section division by topic of the video file.

그리고, 상기 요지단어는 상기 구간별 핵심이 되는 내용을 나타내는 것으로, 상기 제 320 단계를 통해 저장된 텍스트 중 누적치가 가장 높은 텍스트를 요지단어로 설정할 수도 있고, 특정 위치에서 추출한 텍스트를 요지단어로 설정할 수도 있다. In addition, the key word indicates the main content of each section, and the text having the highest cumulative value among the stored texts may be set as the key word, or the text extracted at a specific location may be set as the key word. have.

예를 들어, 도 5를 참조하면, 상기 제어부에서 설정한 이미지의 변화도가 상기 설정값 이상인 지점에 특정변수(n)를 지정하는 경우, 상기 파일의 시작되는 지점(n=0)부터 종료되는 지점(END, n=4)까지 4개의 구간으로 분할되고, 상기 특정변수(n)에 값을 할당하여 각 구간의 시작과 종료위치를 나타낼 수 있다. For example, referring to FIG. 5, when a specific variable n is assigned to a point at which the degree of change of the image set by the controller is equal to or greater than the set value, the file is terminated from the starting point (n = 0) of the file. It is divided into four sections up to the point END, n = 4, and a value is assigned to the specific variable n to indicate the start and end positions of each section.

이때, n=4라 함은 영상 이미지가 동영상의 재생 범위 안에서 4번 변환된 것을 의미하는 것으로, 구체적 예로 판서 내용이 4번 지워지고 새로운 판서내용이 채워진 것을 의미할 수 있다.In this case, n = 4 means that the video image has been converted four times within the playing range of the video. For example, it may mean that the writing content is erased four times and the new writing content is filled.

그리고, 제 1구간(n=0 ~ n=1)에서의 요지단어는 'A'이고, 제 2구간(n=1 ~ n=2)에서의 요지단어는 'G'이며, 제 3구간(n=2 ~ n=3)에서의 요지단어는 'G'이고, 제 4구간(n=3 ~ END)에서의 요지단어는 'X'이다.The key word in the first section (n = 0 to n = 1) is 'A', the key word in the second section (n = 1 to n = 2) is 'G', and the third section ( The key word in n = 2 to n = 3 is 'G' and the key word in the fourth section (n = 3 to END) is 'X'.

한편, 상기 제 330 단계의 비교결과, 상기 영상의 이미지 변화도가 설정값 미만인 경우, 상기 제 320 단계를 반복하여 수행한다.On the other hand, when the comparison result of the step 330, the image change degree of the image is less than the set value, the step 320 is repeated.

다음으로, 제어부는 상기 파일에 대해 섹터 분할을 생성한다(S410). 이때, 상기 섹터 분할은 상기 제 250 단계와 제 340 단계에 대한 결과물들에 대한 오버랩핑(overlapping)을 통해 이루어질 수 있다. Next, the control unit generates sector division for the file (S410). In this case, the sector division may be performed by overlapping the results of the steps 250 and 340.

이때, 상기 오버랩핑을 도 6을 참조하여 설명하면, 상기 섹터는 동영상 파일의 구간별 주제에 따라 분류한 것으로, 상기 영상의 변화도에 의해 분할된 구간의 요지단어와 상기 음원의 재생시간별 추출단어의 누적치에 따라 설정된 대표단어의 매칭을 통해 섹터를 분할할 수 있다. In this case, the overlapping will be described with reference to FIG. 6, wherein the sectors are classified according to the subject of each section of the video file, and the key words of the section divided by the degree of change of the video and the extracted words for each play time of the sound source. The sector may be divided by matching the set representative word according to the cumulative value of.

즉, 기본적으로, 제 250단계에 의해 분석된 대표단어의 변화에 따라 동영상 파일을 구분하되, 정확한 분할위치는 상기 제 340단계에 의해 산출된 영상위치에 따라 분할위치를 설정하는 것이다. 그리고, 상기 분할 위치에 대한 확인은 구간의 대표단어와 요지단어의 일치 여부를 확인할 수 있다.That is, basically, the video files are classified according to the change of the representative word analyzed in operation 250, and the correct division position is set according to the image position calculated in operation 340. FIG. In addition, the checking of the split position may confirm whether the representative word and the main word of the section match.

한편, 각 분할된 구간의 요지단어와 상기 음원에서 추출한 대표단어가 상이한 경우, 상기 제 210 단계 내지 제 250단계 또는 상기 제 310 단계 내지 제 340 단계를 통해 추출한 단어 또는 텍스트들 중에서 상기 대표단어 또는 요지단어를 재설정 및 매칭하여 상기 섹터를 분할할 수 있다.Meanwhile, when the main word of each divided section is different from the representative word extracted from the sound source, the representative word or summary from the words or texts extracted through the steps 210 to 250 or the steps 310 to 340. The sector may be partitioned by resetting and matching words.

그리고, 도 7에 도시된 바와 같이, 상기 섹터 분할 생성된 파일의 재생구간에는 주제별 강의가 변화되는 위치를 식별하기 위한 플래그(flag)가 위치한다. 이때, 학습자는 상기 플래그를 선택/재생하여 수강하고자 하는 강의를 선택적/반복적으로 수강할 수도 있고, 학습자는 상기 파일을 주제에 따라 여러 조각으로 나누어진 저용량의 강의 파일을 제공받아 수강할 수도 있다.As illustrated in FIG. 7, a flag for identifying a position at which a subject lecture is changed is located in a playback section of the sector-divided file. In this case, the learner may select / replay the flag to selectively or repeatedly take the lecture to be taken, and the learner may be provided with the lecture file of a low capacity divided into several pieces according to the subject.

본 발명의 권리는 이상에서 설명된 실시예에 한정되지 않고 청구범위에 기재된 바에 의해 정의되며, 본 발명의 분야에서 통상의 지식을 가진 자가 청구범위에 기재된 권리 범위 내에서 다양한 변형과 개작을 할 수 있다는 것은 자명하다.
The rights of the present invention are not limited to the embodiments described above, but are defined by the claims, and various modifications and variations can be made by those skilled in the art within the scope of the claims. It is self-evident.

본 발명은 동영상 파일의 음원과 영상을 분석하여 동영상 내용의 주제별로 구간을 분할하는 분할 위치 생성 방법에 관한 것으로, 본 발명에 의하면, 다양한 주제를 포함한 동영상 파일에 대하여, 상기 동영상 파일의 음원과 영상으로부터 추출된 키워드의 매칭을 통해 재생구간을 분할하므로, 각 분할된 영상에 대하여 주제를 명확히 할 수 있는 장점이 있다. The present invention relates to a method of generating a split position for dividing a section for each subject of a video content by analyzing a sound source and an image of a video file. Since the playback section is segmented by matching the keywords extracted from the image, the subject can be clarified for each segmented image.

Claims

(a) extracting a sound source from the analysis target file;
(b) extracting words included in the sound source by dividing the sound source by a predetermined time interval;
(c) calculating a cumulative value of the extracted words and analyzing a distribution thereof;
(d) extracting an image from the analysis target file;
(e) recognizing the image and extracting text;
(f) comparing image variation and setting values of the image;
(g) storing the image position and storing the text extracted from the section before the image position when the image change degree of the image is equal to or larger than a set value as a result of the comparison in the step (f); And
(h) generating a split position of the analysis target file based on the distribution change of the extracted word of step (c) and the stored image position and text of step (g). To create split locations by topic.

The method of claim 1,
Distribution analysis of the extracted word of step (c),
And a representative word is set according to the frequency of extraction of the extracted words, and the distribution of the number of times of extraction of the representative words for each play time is analyzed.

3. The method according to claim 1 or 2,
The image degree of change is a numerical value of the rate of change of the reference image in the image,
The setting value is a reference value for determining whether to change the contents of the image according to the image change degree.

The method of claim 3, wherein
The text storage of step (g),
And selecting and storing the key word selected based on the number of repetitions in the section among the extracted texts.

5. The method of claim 4,
Selection of the division position of the (h) step,
calculating a change position of the representative word according to a time-dependent change of the representative word extracted from the sound source;
(h2) comparing the representative word with a key word extracted from the image; And
(h3) if the representative word coincides with the key word, setting the storage position of the image as a split position and setting a tag displaying a subject of the corresponding section as the representative word. How to create a split location for each video file.

6. The method of claim 5,
(h3) If the representative word and the main word do not match, the moving image file is characterized by performing the step (h2) by changing the representative word to any one of the words extracted by the step (c). To create split locations by topic.

6. The method of claim 5,
(h3) when the representative word and the main word do not match, the step (h2) is performed by changing the main word to any one of words among the texts extracted by the step (g). How to create a split location for each video file.

delete