KR20240082137A

KR20240082137A - Method and apparatus for providing text information including text extracted from content including image

Info

Publication number: KR20240082137A
Application number: KR1020230017484A
Authority: KR
Inventors: 윤승섭; 박준현; 이지훈; 김효재; 현재호; 임승택; 오승욱; 최가인; 김혜경; 이상국; 장태영; 오승근; 강태종; 이순형; 김형규
Original assignee: 네이버웹툰 유한회사
Priority date: 2022-12-01
Filing date: 2023-02-09
Publication date: 2024-06-10

Abstract

콘텐츠 서버에 업로드된 이미지를 포함하는 콘텐츠를 식별하고, 식별된 콘텐츠에 포함된 이미지로부터 텍스트를 추출하고, 추출된 텍스트를 포함하는 텍스트 정보를, 콘텐츠와 연관된 텍스트 정보로서, 콘텐츠의 관리자나 콘텐츠의 소비자의 요청에 따라 제공하는 것을 포함하는, 텍스트 정보를 제공하는 방법이 제공된다. Identify content including images uploaded to the content server, extract text from images included in the identified content, and convert text information including the extracted text into text information associated with the content, provided by the administrator of the content or the content. A method of providing text information is provided, including providing text information upon consumer request.

Description

A method and device for providing text information including text extracted from content including an image {METHOD AND APPARATUS FOR PROVIDING TEXT INFORMATION INCLUDING TEXT EXTRACTED FROM CONTENT INCLUDING IMAGE}

본 개시는 이미지를 포함하는 콘텐츠로부터 추출된 텍스트를 포함하는 텍스트 정보를 제공하는 방법 및 장치에 관한 것으로, 웹툰 콘텐츠와 같은 컷과 대사를 포함하는 콘텐츠로부터 대사에 해당하는 텍스트를 추출하여, 해당 텍스트를 포함하는 텍스트 정보를 제공하는 방법 및 장치와 관련된다.The present disclosure relates to a method and device for providing text information including text extracted from content including an image, by extracting text corresponding to the line from content including cuts and lines such as webtoon content, and extracting the corresponding text. Related to a method and device for providing text information including.

만화, 카툰 또는 웹툰 서비스와 같이 이미지를 포함하는 콘텐츠를 온라인으로 제공하는 서비스에 대한 관심이 높아지고 있다. 이러한 콘텐츠가 포함하는 이미지는 콘텐츠의 스토리 진행을 위해 콘텐츠의 캐릭터에 발화되는 대사나 스토리의 설명을 위한 텍스트를 포함하며, 또한, 스토리 진행과는 무관한 텍스트를 포함한다. 스토리 진행과는 무관한 텍스트는 예컨대, 효과음이나 배경의 장식 또는 이미지 내의 오브젝트에 포함된 텍스트로서, 캐릭터에 발화되는 대사나 스토리의 설명을 위한 텍스트와는 달리 콘텐츠의 스토리의 진행에는 영향을 미치지 않는다. Interest in services that provide content containing images online, such as comics, cartoons, or webtoon services, is increasing. Images included in such content include dialogue uttered by characters in the content to progress the story of the content or text to explain the story, and also include text unrelated to the progress of the story. Text that is unrelated to the progress of the story is, for example, text included in sound effects, background decoration, or objects in an image, and unlike dialogue uttered by a character or text to explain the story, it does not affect the progress of the story of the content. .

콘텐츠로부터 추출되는 텍스트는, 콘텐츠와 연관된 정보로서 콘텐츠와 함께 제공될 수 있으며, 예컨대, 콘텐츠를 관리하는 관리자로부터의 요청에 의해 관리자에게 제공되거나, 콘텐츠를 소비하는 소비자로부터의 요청에 의해 소비자에게 제공될 수 있다. Text extracted from the content may be provided together with the content as information related to the content, for example, provided to the manager upon request from the manager managing the content, or provided to the consumer upon request from the consumer consuming the content. It can be.

이 때, 요청에 따라 제공되는 텍스트는, 콘텐츠와 연관된 텍스트 정보로서 단순히 콘텐츠로부터 추출된 모든 텍스트를 나타내는 것이 아니라, 콘텐츠의 스토리와 같은 콘텐츠와 관련된 정보를 효과적으로 제공할 수 있도록 가공될 것이 요구된다. 또한, 콘텐츠로부터의 텍스트 추출에 있어서 스토리 진행과는 무관한 텍스트는 추출에서 배제되도록 하여 유의미한 텍스트만이 텍스트 정보에 포함되도록 할 필요가 있다. At this time, the text provided upon request is text information related to the content and does not simply represent all text extracted from the content, but is required to be processed to effectively provide information related to the content, such as the story of the content. Additionally, when extracting text from content, it is necessary to exclude text that is unrelated to the story progress from extraction so that only meaningful text is included in the text information.

한국등록특허 제10-2374280호(등록일 2022년 03월 10일)에는 이미지로부터 추출한 텍스트의 블록화 시스템 및 그 방법이 개시되어 있다.Korean Patent No. 10-2374280 (registration date March 10, 2022) discloses a system and method for blocking text extracted from images.

상기에서 설명된 정보는 단지 이해를 돕기 위한 것이며, 종래 기술의 일부를 형성하지 않는 내용을 포함할 수 있으며, 종래 기술이 통상의 기술자에게 제시할 수 있는 것을 포함하지 않을 수 있다.The information described above is for illustrative purposes only and may include content that does not form part of the prior art and may not include what the prior art would suggest to a person skilled in the art.

일 실시예는, 콘텐츠 서버에 업로드된 이미지를 포함하는 콘텐츠를 식별하고, 식별된 콘텐츠에 포함된 이미지로부터 텍스트를 추출하고, 추출된 텍스트를 포함하는 텍스트 정보를, 콘텐츠와 연관된 텍스트 정보로서, 콘텐츠의 관리자나 콘텐츠의 소비자의 요청에 따라 제공하는 것을 포함하는, 텍스트 정보를 제공하는 방법을 제공할 수 있다. One embodiment identifies content including an image uploaded to a content server, extracts text from the image included in the identified content, and uses text information including the extracted text as text information associated with the content. A method of providing text information can be provided, including providing it at the request of the administrator or the consumer of the content.

일 실시예는, 이미지에 포함된 복수의 컷 들을 검출하여 각 컷을 포함하는 컷 이미지를 생성하고, 복수의 컷들에 대응하는 컷 이미지들로부터 텍스트를 추출하되, 각각의 컷 이미지에서 대사를 포함하는 대사 영역을 검출하고, 검출된 대사 영역별로 대사 영역에 포함된 텍스트를 OCR (Optical haracter Recognition)을 사용하여 추출하고, 해당 추출된 텍스트에 기반하여 텍스트 정보를 생성하는 방법을 제공할 수 있다. In one embodiment, a plurality of cuts included in an image are detected, a cut image including each cut is generated, text is extracted from the cut images corresponding to the plurality of cuts, and text is included in each cut image. A method of detecting a dialogue region, extracting text included in the dialogue region for each detected dialogue region using OCR (Optical Haracter Recognition), and generating text information based on the extracted text can be provided.

일 측면에 있어서, 컴퓨터 시스템에 의해 수행되는, 콘텐츠와 연관된 텍스트 정보를 제공하는 환 방법에 있어서, 콘텐츠 서버에 업로드된 이미지를 포함하는 콘텐츠를 식별하는 단계, 상기 콘텐츠에 포함된 상기 이미지로부터 텍스트를 추출하는 단계 및 상기 추출된 텍스트를 포함하는 텍스트 정보를, 상기 콘텐츠와 연관된 텍스트 정보로서, 제공하는 단계를 포함하는, 텍스트 정보를 제공하는 방법이 제공된다. In one aspect, a method of providing text information associated with content, performed by a computer system, comprising: identifying content that includes an image uploaded to a content server; extracting text from the image included in the content; A method of providing text information is provided, including the step of extracting and providing text information including the extracted text as text information associated with the content.

상기 이미지는 순서를 갖는 상기 콘텐츠의 복수의 컷들과 상기 콘텐츠의 대사를 포함하는 텍스트를 포함하고, 상기 추출된 텍스트는 상기 이미지에 포함된 텍스트 중 상기 대사를 추출한 것이고, 상기 텍스트 정보는 상기 대사가 포함하는 복수의 라인들의 각 라인과, 상기 각 라인의 순서 정보를 포함할 수 있다. The image includes a plurality of cuts of the content in an order and text including a line of the content, the extracted text is the line extracted from the text included in the image, and the text information includes the line. It may include each line of the plurality of lines included, and order information of each line.

상기 텍스트를 추출하는 단계는, 상기 이미지에서 상기 복수의 컷들을 검출하는 단계; 상기 복수의 컷들의 각 컷을 포함하는 각각의 컷 이미지를 생성하는 단계; 및 상기 복수의 컷들에 대응하는 컷 이미지들로부터 텍스트를 추출하는 단계를 포함할 수 있다. Extracting the text may include detecting the plurality of cuts in the image; generating each cut image including each of the plurality of cuts; and extracting text from cut images corresponding to the plurality of cuts.

상기 복수의 컷들은 상하 방향으로 스크롤링되는 순서로 상기 이미지에 포함되고, 상기 각각의 컷 이미지는 상기 각 컷의 상하에 소정의 크기의 공백 영역을 더 포함하도록 구성될 수 있다. The plurality of cuts may be included in the image in a vertically scrolling order, and each cut image may be configured to further include a blank area of a predetermined size above and below each cut.

상기 컷 이미지들로부터 텍스트를 추출하는 단계는, 상기 각각의 컷 이미지에 대해 상기 대사를 포함하는 대사 영역을 검출하는 단계; 상기 검출된 대사 영역별로 상기 대사 영역에 포함된 텍스트를 OCR (Optical haracter Recognition)을 사용하여 추출하는 단계; 및 상기 검출된 대사 영역별로 추출된 텍스트에 기반하여 상기 텍스트 정보를 생성하는 단계를 포함하고, 상기 대사 영역은 상기 이미지에 포함된 말풍선, 상기 콘텐츠의 화자 또는 캐릭터에 의한 독백 또는 나레이션을 포함하는 영역, 또는 상기 콘텐츠의 설명하는 텍스트를 포함하는 영역이고, 상기 텍스트 정보는 상기 검출된 대사 영역별로 추출된 텍스트가 어느 대사 영역으로부터 및 어느 컷으로부터 추출된 것인지에 대한 정보를 상기 순서 정보로서 포함할 수 있다. Extracting text from the cut images may include detecting a dialogue area including the dialogue for each cut image; Extracting text included in the dialogue region for each detected dialogue region using OCR (Optical Haracter Recognition); and generating the text information based on text extracted for each detected dialogue area, wherein the dialogue area includes a speech bubble included in the image, a monologue or narration by a speaker or character of the content. , or an area containing text explaining the content, and the text information may include information about which dialogue area and which cut the text extracted for each detected dialogue area was extracted as the order information. .

상기 순서 정보는 상기 검출된 대사 영역별로 추출된 텍스트의 해당 대사 영역 내에서의 행(row) 정보를 더 포함할 수 있다. The order information may further include row information within the corresponding dialogue region of the text extracted for each detected dialogue region.

상기 복수의 컷들의 각각에는, 상기 이미지 내에서 상하 방향으로 상측에 있을수록 앞서고, 상하 방향으로 동일한 위치에서는 좌측 또는 우측에 있을수록 순번이 앞서게 되는 제1 순번이 할당되고, 상기 각각의 컷 이미지에서 검출되는 대사 영역의 각각에는, 상기 각각의 컷 이미지 내에서 상하 방향으로 상측에 있을수록 앞서고, 상하 방향으로 동일한 위치에서는 좌측 또는 우측에 있을수록 순번이 앞서게 되는 제2 순번이 할당되고, 상기 검출된 대사 영역별로 추출된 텍스트의 각 라인에는, 상하 방향으로 상측에 있을수록 앞서는 제3 순번이 상기 행 정보로서 할당되고, 상기 텍스트 정보는, 상기 순서 정보로서 상기 대사 영역의 각각에서 추출된 텍스트에 대해 상기 제1 순번, 상기 제2 순번 및 상기 제3 순번을 포함할 수 있다. Each of the plurality of cuts is assigned a first order number, which is higher as it is on the upper side in the vertical direction within the image, and is higher as it is on the left or right side at the same position in the vertical direction, and is assigned a first order number in each cut image. Each of the detected metabolic regions is assigned a second order number that is ahead as it is on the upper side in the vertical direction within each cut image, and is assigned a second order number in which the order number is ahead as it is on the left or right side at the same position in the up and down direction, and the detected To each line of text extracted for each dialogue area, a third order number, which is earlier as it is located at the top in the vertical direction, is assigned as the line information, and the text information is the order information for the text extracted from each of the dialogue areas. It may include the first sequence, the second sequence, and the third sequence.

상기 컷 이미지들로부터 텍스트를 추출하는 단계는, 상기 검출된 대사 영역이 상기 독백 또는 나레이션을 포함하는 제1 영역 또는 상기 설명하는 텍스트를 포함하는 제2 영역이면 상기 제1 영역 또는 상기 제2 영역에 대응하는 가상의 말풍선을 생성하는 단계를 더 포함하고, 상기 순서 정보는, 상기 검출된 대사 영역에 해당하는 말풍선 및 상기 가상의 말풍선을 포함하는 말풍선들의 상기 이미지 내에서의 순서에 기반하여, 상기 검출된 대사 영역별로 추출된 텍스트가 상기 말풍선들 중 어느 것으로부터 추출된 것인지에 대한 정보를 포함할 수 있다. The step of extracting text from the cut images may include, if the detected dialogue region is a first region including the monologue or narration or a second region including the explanatory text, in the first region or the second region. It further includes generating a corresponding virtual speech balloon, wherein the order information is based on the order in the image of speech balloons corresponding to the detected dialogue area and speech balloons including the virtual speech balloon, wherein the detection The text extracted for each dialogue area may include information about which of the speech bubbles it was extracted from.

상기 컷 이미지들로부터 텍스트를 추출하는 단계는, 상기 컷 이미지들로부터 검출된 대사 영역들을 통합하여 하나의 통합 대사 영역 이미지를 생성하는 단계; 및 상기 통합 대사 영역 이미지에 포함된 상기 대사 영역들에 대해, 대사 영역별로 상기 대사 영역에 포함된 텍스트를 OCR (Optical haracter Recognition)을 사용하여 추출하는 단계를 포함할 수 있다. Extracting text from the cut images may include generating one integrated dialogue region image by integrating the dialogue regions detected from the cut images; And with respect to the dialogue regions included in the integrated dialogue region image, the step may include extracting text included in the dialogue region for each dialogue region using OCR (Optical Haracter Recognition).

상기 대사 영역을 검출하는 단계는, 상기 각각의 컷 이미지에서 텍스트를 포함하는 영역들을 검출하는 단계; 상기 영역들 중에서, 상기 각 컷의 배경에 해당하는 텍스트, 상기 콘텐츠의 효과음을 나타내는 텍스트 및 상기 콘텐츠의 스토리와는 관련이 없는 것으로 판단된 텍스트를 포함하는 영역인 비대사 영역을 식별하는 단계; 및 상기 영역들 중에서 상기 비대사 영역을 배제한 영역들을 대사를 포함하는 대사 영역으로서 검출하는 단계를 포함할 수 있다. Detecting the dialogue area may include detecting areas containing text in each cut image; Among the areas, identifying a non-dialogue area, which is an area including text corresponding to the background of each cut, text representing sound effects of the content, and text determined to be unrelated to the story of the content; And it may include detecting regions excluding the non-metabolic region among the regions as metabolic regions containing metabolism.

상기 제공하는 단계는, 상기 콘텐츠를 관리하는 관리자 단말로부터의 요청에 따라, 상기 텍스트 정보를 상기 관리자 단말에 제공하고, 상기 텍스트 정보를 제공하는 방법은, 상기 관리자 단말에 대해 상기 텍스트 정보를 검수 가능하게 하는 기능을 제공하는 단계를 더 포함하고, 상기 검수 가능하게 하는 기능은, 상기 텍스트 정보를 편집 가능하게 하는 제1 기능, 상기 텍스트 정보를 다운로드 가능하게 하는 제2 기능 및 상기 텍스트 정보의 업데이트 가능 여부를 설정하기 위한 제3 기능 중 적어도 하나를 포함할 수 있다. In the providing step, the text information is provided to the administrator terminal according to a request from an administrator terminal that manages the content, and the method of providing the text information allows the text information to be inspected for the administrator terminal. It further includes providing a function to enable the inspection, wherein the function to enable the inspection includes a first function to enable the text information to be edited, a second function to enable the text information to be downloaded, and the text information to be updated. It may include at least one of the third functions for setting availability.

상기 검수 가능하게 하는 기능은 상기 제1 기능을 포함하고, 상기 검수 가능하게 하는 기능을 제공하는 단계는, 상기 관리자 단말에서, 상기 복수의 컷들 중 상기 관리자에 의해 선택된 제1 컷과 상기 선택된 제1 컷으로부터 추출된 대사를 포함하는 상기 텍스트 정보를 표시시키는 단계; 상기 표시된 텍스트 정보를 편집하기 위한 제1 사용자 인터페이스를 제공하는 단계; 및 상기 제1 컷으로부터 상기 복수의 컷들 중 다른 컷인 제2 컷으로의 전환을 가능하게 하는 제2 사용자 인터페이스를 제공하는 단계를 포함할 수 있다. The function enabling inspection includes the first function, and the step of providing the function enabling inspection includes, at the manager terminal, a first cut selected by the manager among the plurality of cuts and the selected first cut. displaying the text information including the dialogue extracted from the cut; providing a first user interface for editing the displayed text information; and providing a second user interface that enables switching from the first cut to a second cut, which is another cut among the plurality of cuts.

상기 제공하는 단계는, 상기 콘텐츠를 소비하는 소비자 단말로부터의 요청에 따라, 상기 소비자 단말에 상기 텍스트 정보에 대응하는 오디오 정보를 제공할 수 있다. The providing step may provide audio information corresponding to the text information to the consumer terminal according to a request from the consumer terminal consuming the content.

상기 제공하는 단계는, 상기 소비자 단말로부터 상기 콘텐츠의 열람이 요청됨에 따라, 상기 콘텐츠와 연관된 상기 텍스트 정보를 호출하는 단계; 상기 복수의 컷들 중 상기 소비자 단말이 열람하고 있는 컷을 인식하는 단계; 상기 텍스트 정보 중 상기 인식된 컷에 해당하는 부분에 대응하는 오디오 정보를 상기 소비자 단말에서 출력시키는 단계를 포함할 수 있다.The providing step includes: calling the text information associated with the content when viewing the content is requested from the consumer terminal; Recognizing a cut being viewed by the consumer terminal among the plurality of cuts; It may include outputting audio information corresponding to a portion of the text information corresponding to the recognized cut from the consumer terminal.

상기 텍스트 정보를 제공하는 방법은, 상기 콘텐츠 서버에 대한 상기 콘텐츠의 업데이트 여부 및 삭제 여부를 모니터링하는 단계; 상기 콘텐츠의 업데이트가 식별되면, 상기 업데이트된 콘텐츠에 포함된 상기 이미지로부터 텍스트를 추출하는 단계; 및 상기 콘텐츠의 삭제가 식별되면, 상기 콘텐츠와 연관된 상기 텍스트 정보를 삭제하는 단계를 더 포함할 수 있다. The method of providing the text information includes monitoring whether the content on the content server is updated or deleted; When an update to the content is identified, extracting text from the image included in the updated content; and when deletion of the content is identified, deleting the text information associated with the content.

상기 텍스트 정보를 제공하는 방법은, 상기 검출된 대사 영역별로 추출된 텍스트를 발화한 상기 콘텐츠의 화자를 결정하는 단계 - 상기 화자는 상기 이미지에서 상기 검출된 대사 영역에 해당하는 말풍선과 연관하여 표현된 화자 이미지 및 상기 검출된 대사 영역에 해당하는 말풍선의 색상 또는 모양 중 적어도 하나에 기반하여 결정됨 -를 더 포함하고, 상기 검출된 대사 영역별로 추출된 텍스트에 기반하여 생성되는 상기 텍스트 정보는 상기 결정된 화자에 대한 정보를 더 포함할 수 있다. The method of providing the text information includes determining a speaker of the content who uttered the text extracted for each detected dialogue region - the speaker is expressed in association with a speech bubble corresponding to the detected dialogue region in the image. determined based on at least one of a speaker image and a color or shape of a speech bubble corresponding to the detected dialogue region, wherein the text information generated based on the text extracted for each detected dialogue region is determined by the speaker Additional information may be included.

다른 일 측면에 있어서, 콘텐츠와 연관된 텍스트 정보를 제공하는 컴퓨터 시스템에 있어서, 상기 컴퓨터 시스템에서 판독 가능한 명령을 실행하도록 구현되는 적어도 하나의 프로세서를 포함하고, 상기 적어도 하나의 프로세서는, 콘텐츠 서버에 업로드된 이미지를 포함하는 콘텐츠를 식별하고, 상기 콘텐츠에 포함된 상기 이미지로부터 텍스트를 추출하고, 상기 추출된 텍스트를 포함하는 텍스트 정보를, 상기 콘텐츠와 연관된 텍스트 정보로서, 제공하는, 컴퓨터 시스템이 제공된다.In another aspect, a computer system providing text information associated with content, comprising at least one processor configured to execute instructions readable by the computer system, the at least one processor configured to: upload to a content server; A computer system is provided that identifies content including an image, extracts text from the image included in the content, and provides text information including the extracted text as text information associated with the content. .

콘텐츠에 포함된 각 컷의 배경에 해당하는 텍스트, 콘텐츠의 효과음을 나타내는 텍스트 및 콘텐츠의 스토리와는 관련이 없는 것으로 판단된 텍스트를 배제하고, 콘텐츠의 대사에 해당하는 텍스트만을 포함하는 텍스트 정보를, 콘텐츠와 연관된 텍스트 정보로서, 콘텐츠의 관리자나 소비자에게 제공할 수 있다.Text information that includes only text corresponding to the dialogue of the content, excluding text corresponding to the background of each cut included in the content, text representing the sound effects of the content, and text judged to be unrelated to the story of the content, This is text information related to content and can be provided to managers or consumers of content.

관리자는 콘텐츠와 연관된 텍스트 정보를 제공 받아 각 컷으로부터 추출된 텍스트를 검수 및 편집할 수 있고, 콘텐츠의 소비자는 콘텐츠의 열람 시, 열람된 콘텐츠와 연관된 텍스트 정보에 대응하는 오디오 정보를 제공 받을 수 있다. Administrators can receive text information related to the content and inspect and edit the text extracted from each cut, and consumers of the content can be provided with audio information corresponding to the text information related to the viewed content when viewing the content. .

콘텐츠의 각 컷에 해당하는 각각의 컷 이미지로부터 대사 영역을 검출하고, 검출된 대사 영역별로 텍스트를 추출할 수 있으며, 추출된 텍스트가 어느 대사 영역으로부터 및 어느 컷으로부터 추출된 것인지와 대사 영역 내에서 어느 행에 속하는지를 나타내는 순서 정보를 추출된 텍스트와 함께 콘텐츠와 연관된 텍스트 정보로 가공할 수 있다. The dialogue area can be detected from each cut image corresponding to each cut of the content, and the text can be extracted for each detected dialogue area. It is possible to determine from which dialogue area and from which cut the extracted text was extracted, and within the dialogue area. The order information indicating which row it belongs to can be processed into text information related to the content along with the extracted text.

도 1은 일 실시예에 따른, 콘텐츠의 이미지로부터 텍스트를 추출하고, 추출된 텍스트를 포함하는 텍스트 정보를 제공하는 방법을 나타낸다.
도 2는 일 실시예에 따른, 텍스트 정보를 제공하는 방법을 수행하는 컴퓨터 시스템, 소비자 단말 및 콘텐츠 서버를 나타낸다.
도 3은 일 실시예에 따른, 콘텐츠의 이미지로부터 텍스트를 추출하고, 추출된 텍스트를 포함하는 텍스트 정보를 제공하는 방법을 나타내는 흐름도이다.
도 4는 일 예에 따른, 콘텐츠의 이미지로부터 텍스트를 추출하는 방법을 나타내는 흐름도이다.
도 5는 일 예에 따른, 컷 이미지(들)로부터 추출된 대사 영역들을 통합하여 통합 대사 영역 이미지를 생성하고, 통합 대사 영역 이미지로부터 텍스트를 추출하는 방법을 나타내는 흐름도이다.
도 6은 일 예에 따른, 추출된 텍스트 및 추출된 텍스트와 연관된 순서 정보를 포함하는 텍스트 정보를 생성하는 방법을 나타내는 흐름도이다.
도 7은 일 예에 따른, 컷 이미지로부터 대사 영역을 검출하는 방법을 나타내는 흐름도이다.
도 8은 일 예에 따른, 콘텐츠를 소비하는 소비자의 소비자 단말에 콘텐츠와 연관된 텍스트 정보를 제공하는 방법을 나타내는 흐름도이다.
도 9는 일 예에 따른, 콘텐츠의 업데이트 또는 삭제에 따라 텍스트를 재추출하여 텍스트 정보를 생성하거나 텍스트 정보를 삭제하는 방법을 나타내는 흐름도이다.
도 10a는 일 예에 따른, 콘텐츠의 컷에 대응하는 컷 이미지를 나타낸다.
도 10b는 일 예에 따른, 콘텐츠의 이미지로부터 컷을 추출하는 방법을 나타낸다.
도 11a는 일 예에 따른, 컷 이미지로부터 대사 영역을 검출하는 방법을 나타낸다.
도 11b는 일 예에 따른, 말풍선(또는, 가상의 말풍선)인 대사 영역으로부터 텍스트를 검출하는 방법을 나타낸다.
도 12는 일 예에 따른, 콘텐츠의 컷들과 각 컷에 포함된 대사 영역의 순서를 결정하는 방법을 나타낸다.
도 13은 일 예에 따른, 텍스트 정보를 제공하는 방법을 나타낸다.
도 14 및 도 15는 일 예에 따른, 관리자 단말에서 텍스트 정보를 열람 및 검수하는 방법을 나타낸다.
도 16은 일 예에 따른, 소비자 단말에 텍스트 정보에 대응하는 오디오 정보를 제공하는 방법을 나타낸다.
도 17은 일 예에 따른, 웹툰 콘텐츠인 콘텐츠의 이미지로부터 텍스트를 추출하고, 추출된 텍스트를 포함하는 텍스트 정보를 제공하는 방법을 나타낸다.
도 18은 일 예에 따른, 대사 영역에 포함된 텍스트를 발화한 화자 또는 캐릭터를 결정하는 방법을 나타낸다.
도 19는 일 예에 따른, 통합 대사 영역 이미지를 나타낸다. Figure 1 shows a method of extracting text from an image of content and providing text information including the extracted text, according to an embodiment.
2 illustrates a computer system, a consumer terminal, and a content server performing a method for providing text information, according to one embodiment.
Figure 3 is a flowchart showing a method of extracting text from an image of content and providing text information including the extracted text, according to an embodiment.
Figure 4 is a flowchart illustrating a method for extracting text from an image of content, according to an example.
FIG. 5 is a flowchart illustrating a method of generating an integrated dialogue region image by integrating dialogue regions extracted from cut image(s) and extracting text from the integrated dialogue region image, according to an example.
FIG. 6 is a flowchart illustrating a method of generating text information including extracted text and order information associated with the extracted text, according to one example.
Figure 7 is a flowchart showing a method for detecting a metabolic region from a cut image, according to an example.
FIG. 8 is a flowchart illustrating a method of providing text information associated with content to a consumer terminal of a consumer consuming content, according to an example.
FIG. 9 is a flowchart illustrating a method of generating text information or deleting text information by re-extracting text according to update or deletion of content, according to an example.
FIG. 10A shows a cut image corresponding to a cut of content, according to an example.
Figure 10b shows a method of extracting a cut from an image of content, according to an example.
FIG. 11A shows a method for detecting a metabolic region from a cut image, according to an example.
FIG. 11B shows a method of detecting text from a dialogue area that is a speech bubble (or a virtual speech balloon), according to an example.
Figure 12 shows a method of determining the order of cuts of content and dialogue regions included in each cut, according to an example.
Figure 13 shows a method of providing text information, according to an example.
Figures 14 and 15 show a method of viewing and reviewing text information at an administrator terminal, according to an example.
Figure 16 shows a method of providing audio information corresponding to text information to a consumer terminal, according to an example.
Figure 17 shows a method of extracting text from an image of content that is webtoon content and providing text information including the extracted text, according to an example.
Figure 18 shows a method of determining the speaker or character who uttered the text included in the dialogue area, according to an example.
Figure 19 shows an integrated metabolic region image, according to one example.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the attached drawings.

도 1은 일 실시예에 따른, 콘텐츠의 이미지로부터 텍스트를 추출하고, 추출된 텍스트를 포함하는 텍스트 정보를 제공하는 방법을 나타낸다.Figure 1 shows a method of extracting text from an image of content and providing text information including the extracted text, according to an embodiment.

도 1을 참조하여, 이미지(10)를 포함하는 콘텐츠로부터 텍스트를 추출하고, 추출된 텍스트를 포함하는 텍스트 정보(50)를 생성하여 제공하는 방법을 설명한다. Referring to FIG. 1, a method of extracting text from content including an image 10 and generating and providing text information 50 including the extracted text will be described.

콘텐츠는 이미지(10)를 포함하여 구성되는 것으로서, 예컨대, 웹툰 콘텐츠일 수 있다. 웹툰 콘텐츠는 유/무선 네트워크를 기반으로 인터넷을 통해 제공되는 디지털 콘텐츠 형태의 만화를 포함할 수 있다.The content includes an image 10 and may be, for example, webtoon content. Webtoon content may include comics in the form of digital content provided through the Internet based on wired/wireless networks.

이러한 웹툰 콘텐츠는 소비자(또는 독자)가 해당 웹툰 콘텐츠를 스크롤링하는 것에 의해 소비자에게 읽힐 수 있다. 웹툰 콘텐츠는 복수의 컷들을 포함하며, 독자는 스크롤링을 통해 복수의 컷들을 순차적으로 확인함으로써 웹툰 콘텐츠를 열람할 수 있다.Such webtoon content can be read by consumers (or readers) by scrolling the webtoon content. Webtoon content includes multiple cuts, and readers can view webtoon content by sequentially checking multiple cuts through scrolling.

실시예에서 설명하는 콘텐츠는 복수의 에피소드들로 구성된 작품의 하나의 에피소드를 나타낼 수 있다. 일례로, 콘텐츠는 웹툰 작품의 특정한 에피소드를 나타낼 수 있다. The content described in the embodiment may represent one episode of a work composed of a plurality of episodes. For example, content may represent a specific episode of a webtoon work.

도시된 이미지(10)는 콘텐츠의 적어도 일부일 수 있다. 이러한 이미지(10)는 순서를 갖는 콘텐츠의 복수의 컷들(20-1 내지 20-N)과 콘텐츠의 대사를 포함하는 텍스트를 포함할 수 있다. 대사는 콘텐츠에 포함된 화자(또는 캐릭터)에 의해 발화된 적어도 하나의 단어를 포함할 수 있다. 대사는 콘텐츠의 화자 또는 캐릭터에 의한 독백 또는 나레이션을 포함할 수 있다. 또는, 대사는 콘텐츠의 특정한 화자나 캐릭터에 의해 발화된 단어가 아니라, 콘텐츠의 스토리를 설명하는 적어도 하나의 단어를 포함할 수 있다. 이러한 콘텐츠의 스토리를 설명하는 단어는 콘텐츠의 작가나 서술자가 발화한 것일 수 있다. 이미지(10)에 포함된 대사는 이미지(10)에 포함된 텍스트 중에서 스토리 진행과는 무관한 텍스트를 배제한 것일 수 있다. The depicted image 10 may be at least part of the content. This image 10 may include a plurality of cuts 20-1 to 20-N of content having an order and text including dialogue of the content. Dialogue may include at least one word uttered by a speaker (or character) included in the content. Dialogue may include a monologue or narration by the speaker or character of the content. Alternatively, the dialogue may include at least one word that describes the story of the content, rather than a word uttered by a specific speaker or character of the content. Words that describe the story of such content may be uttered by the author or narrator of the content. The dialogue included in the image 10 may exclude text unrelated to the progress of the story from among the text included in the image 10.

컴퓨터 시스템(예컨대, 도 2를 참조하여 후술될 컴퓨터 시스템(100)은 콘텐츠의 이미지(10)로부터 이미지에 포함된 텍스트를 추출할 수 있고, 해당 텍스트를 포함하는 텍스트 정보(50)를 생성할 수 있다. A computer system (e.g., a computer system 100, which will be described later with reference to FIG. 2) can extract text included in the image from the image 10 of the content and generate text information 50 including the text. there is.

이미지(10)로부터 추출된 텍스트는 이미지(10)에 포함된 텍스트 중에서 대사를 추출한 것일 수 있다. 이로서 추출된 텍스트는 콘텐츠의 스토리를 설명하기 위해 필요한 텍스트로만 구성될 수 있다. 컴퓨터 시스템은 이미지(10)로부터 대사를 포함하는 대사 영역(12, 22 및 24)을 검출할 수 있고, 대사 영역별로 텍스트를 추출할 수 있다. 컴퓨터 시스템은 컷들(20-1 내지 20-N)의 각각별로 대사 영역을 검출하여, 대사 영역별로 텍스트를 추출할 수 있다.The text extracted from the image 10 may be dialogue extracted from the text included in the image 10. As a result, the extracted text can consist of only the text necessary to explain the story of the content. The computer system can detect dialogue regions 12, 22, and 24 containing dialogue from the image 10 and extract text for each dialogue region. The computer system can detect a dialogue region for each of the cuts 20-1 to 20-N and extract text for each dialogue region.

도시된 것처럼, 추출된 텍스트를 포함하여 생성되는 텍스트 정보(50)는 추출된 텍스트가 어떠한 컷에서 추출된 것인지를 나타내는 정보(예컨대, 컷 N, N은 정수), 어떠한 대사 영역에서 추출된 것인지를 나타내는 정보(예컨대, 대사 영역 K, K는 정수), 추출된 텍스트의 라인이 몇 번째 행(row)인지를 나타내는 정보(예컨대, [R], R은 정수)를 포함할 수 있다. 이처럼 텍스트 정보(50)는 추출된 텍스트와 해당 추출된 텍스트와 연관된 순서 정보를 포함하여 구성될 수 있다. 말하자면, 텍스트 정보(50)는 콘텐츠의 대사가 포함하는 복수의 라인들의 각 라인과, 해당 각 라인의 순서 정보를 포함하여 구성될 수 있다. As shown, text information 50 generated including the extracted text includes information indicating which cut the extracted text was extracted from (e.g., cut N, N is an integer), and which dialogue region it was extracted from. It may include information indicating the dialogue area K (e.g., dialogue area K, K is an integer), and information indicating which row of the extracted text line is (e.g., [R], R is an integer). In this way, the text information 50 may include extracted text and order information associated with the extracted text. In other words, the text information 50 may be composed of each line of a plurality of lines included in the dialogue of the content, and order information of each line.

텍스트 정보(50)는 콘텐츠의 스토리를 설명하는 유의미한 텍스트를 포함하도록 구성되는 바, 콘텐츠를 관리하는 관리자에게 제공될 수 있다. 관리자는 콘텐츠를 소비자에게 서비스함에 있어서 텍스트 정보(50)를 활용할 수 있다. 예컨대, 콘텐츠의 소비자는 요청에 따라, 콘텐츠의 열람 시 텍스트 정보(50)에 해당하는 오디오 정보를 제공 받을 수 있다. The text information 50 is configured to include meaningful text explaining the story of the content, and may be provided to an administrator who manages the content. The manager can utilize the text information 50 when servicing content to consumers. For example, upon request, a consumer of content may be provided with audio information corresponding to the text information 50 when viewing the content.

컴퓨터 시스템이 콘텐츠의 이미지(10)로부터 텍스트를 추출하고, 추출된 텍스트를 포함하는 텍스트 정보(50)를 생성해 제공하는 보다 구체적인 방법에 대해서는 후술될 도 2 내지 도 19를 참조하여 더 자세하게 설명된다. A more specific method of the computer system extracting text from the image 10 of the content and generating and providing text information 50 including the extracted text will be described in more detail with reference to FIGS. 2 to 19 to be described later. .

도 2는 일 실시예에 따른, 텍스트 정보를 제공하는 방법을 수행하는 컴퓨터 시스템, 소비자 단말 및 콘텐츠 서버를 나타낸다.2 illustrates a computer system, a consumer terminal, and a content server performing a method for providing text information, according to one embodiment.

컴퓨터 시스템(100)은 실시예의 텍스트 정보를 제공하는 방법을 수행하기 위해 필요한 작업을 수행하는 컴퓨팅 장치일 수 있다.Computer system 100 may be a computing device that performs tasks necessary to perform the method of providing text information of the embodiment.

컴퓨터 시스템(100)은 적어도 하나의 컴퓨팅 장치를 포함하도록 구성될 수 있다. 컴퓨터 시스템(100)는, 콘텐츠가 포함하는 이미지(10)로부터 대사 영역(12, 22, 24 등)을 검출하여 대사 영역(12, 22, 24 등)에 포함된 텍스트를 추출할 수 있고, 추출된 텍스트를 포함하는 텍스트 정보(50)를 생성할 수 있고, 생성된 텍스트 정보(50)를 콘텐츠를 관리하는 관리자 단말(미도시)이나 소비하는 소비자 단말(160)에 제공할 수 있다. Computer system 100 may be configured to include at least one computing device. The computer system 100 can detect the dialogue area (12, 22, 24, etc.) from the image 10 included in the content, extract the text included in the dialogue area (12, 22, 24, etc.), and extract Text information 50 including text can be generated, and the generated text information 50 can be provided to an administrator terminal (not shown) that manages the content or a consumer terminal 160 that consumes the content.

컴퓨터 시스템(100)은 전술한 관리자 단말이거나, 관리자 단말과 통신하는 다른 컴퓨터 장치 또는 서버일 수 있다. 관리자 단말은 제공된 텍스트 정보(50)를 열람 및 검수하기 위한 툴을 제공할 수 있다. The computer system 100 may be the manager terminal described above, or another computer device or server that communicates with the manager terminal. The administrator terminal may provide tools for viewing and inspecting the provided text information 50.

한편, 컴퓨터 시스템(100)은 콘텐츠 서버(150)에 업로드된 콘텐츠를 식별하여 전술한 텍스트 추출 및 텍스트 정보(50)의 제공을 수행할 수 있다. Meanwhile, the computer system 100 may identify content uploaded to the content server 150 and perform the above-described text extraction and provision of text information 50.

콘텐츠 서버(150)는 콘텐츠가 관리되는 서버로서, 콘텐츠 서버(150)에 대해 콘텐츠가 업로드될 수 있고, 업로드된 콘텐츠는 업데이트 또는 삭제될 수 있다. 콘텐츠 서버(150)는 웹툰 콘텐츠인 콘텐츠를 제공하는 서비스 플랫폼이거나 서비스 플랫폼의 일부일 수 있다.The content server 150 is a server where content is managed. Content can be uploaded to the content server 150, and the uploaded content can be updated or deleted. The content server 150 may be a service platform that provides content, such as webtoon content, or may be a part of a service platform.

아래에서는, 컴퓨터 시스템(100)의 보다 세부적인 구성에 대해 더 자세하게 설명한다. Below, the more detailed configuration of the computer system 100 will be described in more detail.

컴퓨터 시스템(100)은 도시된 것처럼, 메모리(130), 프로세서(120), 통신부(110) 및 입출력 인터페이스(140)를 포함할 수 있다.As shown, the computer system 100 may include a memory 130, a processor 120, a communication unit 110, and an input/output interface 140.

메모리(130)는 컴퓨터에서 판독 가능한 기록매체로서, RAM(random access memory), ROM(read only memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(permanent mass storage device)를 포함할 수 있다. 여기서 ROM과 비소멸성 대용량 기록장치는 메모리(130)와 분리되어 별도의 영구 저장 장치로서 포함될 수도 있다. 또한, 메모리(130)에는 운영체제와 적어도 하나의 프로그램 코드가 저장될 수 있다. 이러한 소프트웨어 구성요소들은 메모리(130)와는 별도의 컴퓨터에서 판독 가능한 기록매체로부터 로딩될 수 있다. 이러한 별도의 컴퓨터에서 판독 가능한 기록매체는 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리 카드 등의 컴퓨터에서 판독 가능한 기록매체를 포함할 수 있다. 다른 실시예에서 소프트웨어 구성요소들은 컴퓨터에서 판독 가능한 기록매체가 아닌 통신부(110)를 통해 메모리(130)에 로딩될 수도 있다. The memory 130 is a computer-readable recording medium and may include a non-permanent mass storage device such as random access memory (RAM), read only memory (ROM), and a disk drive. Here, the ROM and the non-perishable mass recording device may be separated from the memory 130 and included as a separate permanent storage device. Additionally, an operating system and at least one program code may be stored in the memory 130. These software components may be loaded from a computer-readable recording medium separate from the memory 130. Such separate computer-readable recording media may include computer-readable recording media such as floppy drives, disks, tapes, DVD/CD-ROM drives, and memory cards. In another embodiment, software components may be loaded into the memory 130 through the communication unit 110 rather than a computer-readable recording medium.

프로세서(120)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(130) 또는 통신부(110)에 의해 프로세서(120)로 제공될 수 있다. 예를 들어, 프로세서(120)는 메모리(130)에 로딩된 프로그램 코드에 따라 수신되는 명령을 실행하도록 구성될 수 있다. The processor 120 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations. Commands may be provided to the processor 120 by the memory 130 or the communication unit 110. For example, the processor 120 may be configured to execute instructions received according to program code loaded into the memory 130.

통신부(110)는 컴퓨터 시스템(100)이 다른 장치(사용자 단말 또는 다른 서버 등)와 통신하기 위한 구성일 수 있다. 말하자면, 통신부(110)는 다른 장치에 대해 데이터 및/또는 정보를 전송/수신하는, 컴퓨터 시스템(100)의 안테나, 데이터 버스, 네트워크 인터페이스 카드, 네트워크 인터페이스 칩 및 네트워킹 인터페이스 포트 등과 같은 하드웨어 모듈 또는 네트워크 디바이스 드라이버(driver) 또는 네트워킹 프로그램과 같은 소프트웨어 모듈일 수 있다.The communication unit 110 may be a component that allows the computer system 100 to communicate with other devices (such as user terminals or other servers). That is, the communication unit 110 is a hardware module or network, such as an antenna, a data bus, a network interface card, a network interface chip, and a networking interface port, of the computer system 100 that transmits/receives data and/or information to and from other devices. It may be a software module such as a device driver or networking program.

입출력 인터페이스(140)는 키보드 또는 마우스 등과 같은 입력 장치 및 디스플레이나 스피커와 같은 출력 장치와의 인터페이스를 위한 수단일 수 있다.The input/output interface 140 may be a means for interfacing with an input device such as a keyboard or mouse and an output device such as a display or speaker.

프로세서(120)는 컴퓨터 시스템(100)의 구성 요소들을 관리할 수 있고, 전술한 콘텐츠의 이미지(10)로부터의 대사 영역(12, 22, 24 등)의 검출, 대사 영역별로의 텍스트 추출 및 텍스트 정보(50)의 생성 및 제공을 수행하기 위한 프로그램 또는 어플리케이션을 실행할 수 있고, 상기 프로그램 또는 어플리케이션의 실행 및 데이터의 처리 등에 필요한 연산을 처리할 수 있다. 프로세서(120)는 컴퓨터 시스템(100)의 적어도 하나의 프로세서(CPU 또는 GPU 등) 또는 프로세서 내의 적어도 하나의 코어(core)일 수 있다.The processor 120 may manage the components of the computer system 100, detecting the dialogue areas 12, 22, 24, etc. from the image 10 of the above-described content, extracting text for each dialogue area, and extracting the text. A program or application for generating and providing information 50 can be executed, and operations necessary for executing the program or application and processing data can be processed. The processor 120 may be at least one processor (such as a CPU or GPU) of the computer system 100 or at least one core within the processor.

또한, 실시예들에서 컴퓨터 시스템(100) 및 프로세서(120)는 도시된 구성요소들보다 더 많은 구성요소들을 포함할 수도 있다. 예컨대, 프로세서(120)는 변환 모델을 학습시키고, 학습된 변환 모델을 사용하여 실시예의 이미지 변환 방법을 수행하기 위한 기능들을 수행하는 구성들을 포함할 수 있다. 이러한 프로세서(120)의 구성들은 프로세서(120)의 일부이거나 프로세서(120)에 의해 구현되는 기능일 수 있다. 프로세서(120)가 포함하는 구성들은, 운영체제의 코드나 적어도 하나의 컴퓨터 프로그램의 코드에 따른 제어 명령(instruction)에 따라 프로세서(120)가 수행하는 서로 다른 기능들(different functions)의 표현일 수 있다.Additionally, in embodiments, computer system 100 and processor 120 may include more components than those shown. For example, the processor 120 may include components that train a transformation model and perform functions for performing the image transformation method of the embodiment using the learned transformation model. These components of the processor 120 may be part of the processor 120 or may be functions implemented by the processor 120. The components included in the processor 120 may be expressions of different functions performed by the processor 120 according to control instructions according to the code of the operating system or the code of at least one computer program. .

도시된 소비자 단말(160)은 콘텐츠를 열람하는 소비자의 사용자 단말일 수 있다. 사용자 단말은, 스마트 폰과 같은 스마트 기기이거나, PC(personal computer), 노트북 컴퓨터(laptop computer), 랩탑 컴퓨터(laptop computer), 태블릿(tablet), 사물 인터넷(Internet Of Things) 기기, 또는 웨어러블 컴퓨터(wearable computer) 등일 수 있다. 기타, 소비자 단말(160)은 웹툰 콘텐츠인 콘텐츠를 열람 가능한 여하한 전자 장치(일례로, 전자책 리더 등)일 수 있다.The illustrated consumer terminal 160 may be a user terminal of a consumer viewing content. The user terminal is a smart device such as a smart phone, a personal computer (PC), a laptop computer, a laptop computer, a tablet, an Internet of Things device, or a wearable computer ( wearable computer), etc. In addition, the consumer terminal 160 may be any electronic device (for example, an e-book reader, etc.) capable of viewing content that is webtoon content.

전술한 것처럼, 콘텐츠 서버(150)는 콘텐츠를 제공하는 서비스 플랫폼이거나 서비스 플랫폼의 일부일 수 있다. 실시예에 따라서는, 콘텐츠 서버(150)는 컴퓨터 시스템(100)을 포함하거나 컴퓨터 시스템(100)이 콘텐츠 서버(150)를 포함할 수도 있다. As described above, the content server 150 may be a service platform that provides content or may be part of a service platform. Depending on the embodiment, the content server 150 may include the computer system 100 or the computer system 100 may include the content server 150.

소비자 단말(160)과 콘텐츠 서버(150)는 컴퓨터 시스템으로서, 컴퓨터 시스템(100)과 유사한 구성들을 포함할 수 있는 바, 중복되는 설명은 생략한다. The consumer terminal 160 and the content server 150 are computer systems and may include similar components to the computer system 100, so redundant descriptions will be omitted.

컴퓨터 시스템(100)을 통해 콘텐츠와 연관된 텍스트 정보(50)를 제공하는 보다 구체적인 방법과, 컴퓨터 시스템(100), 소비자 단말(160) 및 콘텐츠 서버(150)의 동작에 대해서는 후술될 도 3 내지 도 19를 참조하여 더 자세하게 설명된다. A more specific method of providing text information 50 associated with content through the computer system 100 and the operations of the computer system 100, consumer terminal 160, and content server 150 will be described in FIGS. 3 to 3 below. This is explained in more detail with reference to 19.

후술될 상세한 설명에서, 컴퓨터 시스템(100) 또는 프로세서(120)나 이들의 구성들에 의해 수행되는 동작은 설명의 편의상 컴퓨터 시스템(100)에 의해 수행되는 동작으로 설명될 수 있다. In the detailed description to be described later, operations performed by the computer system 100 or the processor 120 or their components may be described as operations performed by the computer system 100 for convenience of explanation.

도 3은 일 실시예에 따른, 콘텐츠의 이미지로부터 텍스트를 추출하고, 추출된 텍스트를 포함하는 텍스트 정보를 제공하는 방법을 나타내는 흐름도이다. Figure 3 is a flowchart showing a method of extracting text from an image of content and providing text information including the extracted text, according to an embodiment.

단계(310)에서, 컴퓨터 시스템(100)은 텍스트 정보(50)를 생성하기 위한 대상이 되는 이미지(10)를 포함하는 콘텐츠를 식별할 수 있다. 예컨대, 컴퓨터 시스템(100)은 콘텐츠 서버(150)에 업로드된 이미지(10)를 포함하는 콘텐츠를 식별할 수 있다. 컴퓨터 시스템(100)은 주기적으로(예컨대, 1시간) 콘텐츠 서버(150)를 모니터링할 수 있고, 콘텐츠 서버(150)에 이미지(10)를 포함하는 콘텐츠가 업로드되거나 기 업로드된 이미지(10)를 포함하는 콘텐츠의 업데이트가 식별되면 해당 콘텐츠를 텍스트 정보(50)를 생성하기 위한 대상이 되는 것으로서 식별할 수 있다. 또는, 컴퓨터 시스템(100)은 콘텐츠 서버(150)로부터 이미지(10)를 포함하는 콘텐츠가 업로드(또는 업데이트)된 것을 통지 받을 수 있고, 이에 따라 업로드(또는 업데이트)된 이미지(10)를 포함하는 콘텐츠를 식별할 수도 있다. At step 310, computer system 100 may identify content including image 10 that is a target for generating text information 50. For example, computer system 100 may identify content that includes image 10 uploaded to content server 150 . The computer system 100 may periodically (e.g., 1 hour) monitor the content server 150 and determine whether content including the image 10 is uploaded to the content server 150 or a previously uploaded image 10 is uploaded to the content server 150. When an update of the included content is identified, the corresponding content can be identified as a target for generating text information 50. Alternatively, the computer system 100 may be notified that content including the image 10 has been uploaded (or updated) from the content server 150, and the computer system 100 may be notified that the content including the image 10 has been uploaded (or updated) accordingly. Content can also be identified.

단계(320)에서, 컴퓨터 시스템(100)은 콘텐츠에 포함된 이미지(10)로부터 텍스트를 추출할 수 있다. 예컨대, 컴퓨터 시스템(100)은 OCR (Optical haracter Recognition)을 사용하여 이미지(10)로부터 텍스트를 추출할 수 있다. 전술한 것처럼 이미지(10)는 웹툰 콘텐츠의 적어도 일부로서, 순서를 갖는 콘텐츠의 복수의 컷들과 콘텐츠의 대사를 포함하는 텍스트를 포함할 수 있다. 이 때, 컴퓨터 시스템(100)은 이미지에 포함된 텍스트 중 대사를 추출할 수 있다. 컴퓨터 시스템(100)은 이미지(10)로부터 컷 및 대사 영역을 추출하고, 대사 영역으로부터 텍스트를 추출하도록 미리 훈련된 학습 모델을 사용하여, 이미지(10)로부터 대사를 추출할 수 있다.At step 320, computer system 100 may extract text from image 10 included in the content. For example, computer system 100 may extract text from image 10 using optical haracter recognition (OCR). As described above, the image 10 is at least part of the webtoon content and may include a plurality of cuts of the content in an order and text including dialogue of the content. At this time, the computer system 100 may extract dialogue from text included in the image. Computer system 100 may extract cuts and dialogue regions from image 10 and extract dialogue from image 10 using a learning model pre-trained to extract text from the dialogue regions.

컴퓨터 시스템(100)은 추출된 텍스트를 포함하는 텍스트 정보(50)를 생성할 수 있다. 예컨대, 컴퓨터 시스템(100)은 추출된 텍스트를 순서 정보를 포함하도록 가공하여 포함하는 텍스트 정보(50)를 생성할 수 있다. 텍스트 정보(50)에 포함되는 순서 정보는, 일례로, 추출된 텍스트가 어떠한 컷에서 추출된 것인지를 나타내는 정보(예컨대, 컷 N, N은 정수), 어떠한 대사 영역에서 추출된 것인지를 나타내는 정보(예컨대, 대사 영역 K, K는 정수), 및 추출된 텍스트의 라인이 몇 번째 행(row)인지를 나타내는 정보(예컨대, [R], R은 정수)를 포함할 수 있다. The computer system 100 may generate text information 50 including the extracted text. For example, the computer system 100 may generate text information 50 including the extracted text by processing it to include order information. The order information included in the text information 50 is, for example, information indicating which cut the extracted text was extracted from (e.g., cut N, N is an integer), information indicating which dialogue area it was extracted from ( For example, it may include a dialogue area K (K is an integer), and information indicating which row of the extracted text line is (for example, [R], R is an integer).

이미지(10)로부터 텍스트를 추출하는 보다 자세한 방법에 대해서는 후술될 도 4 내지 7과 도 10 및 도 11을 참조하여 더 자세하게 설명한다. A more detailed method of extracting text from the image 10 will be described in more detail with reference to FIGS. 4 to 7 and FIGS. 10 and 11, which will be described later.

단계(330)에서, 컴퓨터 시스템(100)은 단계(320)에 의해 추출된 텍스트를 포함하는 텍스트 정보(50)를, 콘텐츠와 연관된 텍스트 정보(50)로서, 제공할 수 있다. At step 330, computer system 100 may provide text information 50, including the text extracted by step 320, as text information 50 associated with content.

예컨대, 컴퓨터 시스템(100)은 콘텐츠를 관리하는 관리자 단말로부터의 요청에 따라, 텍스트 정보(50)를 관리자 단말에 제공할 수 있다(단계(332)). 관리자 단말은 제공된 텍스트 정보(50)를 열람 및 검수하기 위한 툴을 제공할 수 있다.For example, the computer system 100 may provide text information 50 to the administrator terminal according to a request from the administrator terminal that manages content (step 332). The administrator terminal may provide tools for viewing and inspecting the provided text information 50.

관련하여, 도 14 및 도 15는 일 예에 따른, 관리자 단말에서 텍스트 정보를 열람 및 검수하는 방법을 나타낸다.In relation to this, Figures 14 and 15 show a method of viewing and reviewing text information at an administrator terminal, according to an example.

도 14 및 15는 텍스트 정보(50)를 열람 및 검수하기 위한 툴을 표시하는 관리자 단말의 화면을 나타낸다. 14 and 15 show a screen of an administrator terminal displaying tools for viewing and inspecting text information 50.

컴퓨터 시스템(100)은 관리자 단말에 대해 텍스트 정보(50)를 검수 가능하게 하는 기능을 제공할 수 있다. The computer system 100 may provide a function that enables inspection of text information 50 for an administrator terminal.

도 14에서는, 텍스트 정보(50)를 열람 및 검수하기 위한 툴에 있어서 텍스트 정보(50)의 리스트가 도시되었다. 도시된 것처럼, 제1 UI(1410)를 통해 소정의 기준(예컨대, ID, 콘텐츠명, 번호, 회차명(에피소드명), 콘텐츠 게시일, 텍스트 정보 수정일, 텍스트 정보 수정자 등)에 따라 텍스트 정보(50)를 정렬할 수 있다. 리스트에서 하나의 텍스트 정보(1420)가 선택되면, 관리자 단말은 해당 텍스트 정보(1420)를 검수하기 위한 화면으로 전환될 수 있다. 예컨대, 관리자 단말은 도 15에서 도시된 것과 같은 화면으로 전환될 수 있다. In FIG. 14, a list of text information 50 is shown in a tool for viewing and inspecting text information 50. As shown, text information (e.g., ID, content name, number, episode name, content posting date, text information modification date, text information modifier, etc.) is displayed through the first UI 1410. 50) can be sorted. When one piece of text information 1420 is selected from the list, the administrator terminal can be switched to a screen for inspecting the text information 1420. For example, the manager terminal can be switched to a screen like the one shown in FIG. 15.

텍스트 정보(50)를 열람 및 검수하기 위한 툴은 텍스트 정보(50)를 검수 가능하게 하는 기능으로서, 텍스트 정보(50)를 편집 가능하게 하는 제1 기능(1430), 텍스트 정보(50)를 다운로드 가능하게 하는 제2 기능(1440) 및 텍스트 정보(50)의 업데이트 가능 여부를 설정하기 위한 제3 기능(1450) 중 적어도 하나를 포함할 수 있다. The tool for viewing and inspecting the text information 50 is a function that enables inspection of the text information 50. The first function 1430 allows editing the text information 50, and downloads the text information 50. It may include at least one of a second function 1440 for enabling and a third function 1450 for setting whether the text information 50 can be updated.

예컨대, 제1 기능(1430)이 선택되면, 관리자 단말은 해당 텍스트 정보(50)를 검수하기 위한 화면으로 전환될 수 있다. 일례로, 관리자 단말은 도 15에서 도시된 것과 같은 화면으로 전환될 수 있다.For example, when the first function 1430 is selected, the manager terminal can be switched to a screen for inspecting the corresponding text information 50. For example, the manager terminal may be switched to a screen like the one shown in FIG. 15.

제2 기능(1440)을 통해서는 텍스트 정보(50)가 예컨대, 엑셀 파일로 관리자 단말에 다운로드될 수 있다. Through the second function 1440, text information 50 can be downloaded to the administrator terminal as, for example, an Excel file.

제3 기능(1450)이 ON 되면 텍스트 정보(50)의 업데이트가 금지될 수 있다. 제3 기능(1450)이 ON 되면, 콘텐츠가 업데이트되거나 혹은 삭제되더라도 텍스트 정보는 변경 또는 삭제되지 않을 수 있다. 제3 기능(1450)은 OFF가 디폴트일 수 있다. When the third function 1450 is turned on, updating the text information 50 may be prohibited. When the third function 1450 is turned on, text information may not be changed or deleted even if the content is updated or deleted. The third function 1450 may be OFF by default.

아래에서는, 도 15를 참조하여, 텍스트 정보(50)를 검수하는 방법을 더 자세하게 설명한다. 도 15에서는 텍스트 정보 리스트에서 텍스트 정보(1430)가 선택되었다. Below, with reference to FIG. 15, the method of inspecting the text information 50 will be described in more detail. In Figure 15, text information 1430 is selected from the text information list.

도시된 것처럼, 관리자 단말은 콘텐츠의 작품명(1510)과, 콘텐츠의 에피소드명(1520)을 표시할 수 있다. 버튼(1530)이 선택되면, 해당 작품의 에피소드 리스트가 호출될 수 있다. 관리자는 에피소드 리스트를 통해 동일 작품의 다른 에피소드의 텍스트 정보(50)를 검수할 수 있다. As shown, the administrator terminal can display the work name 1510 of the content and the episode name 1520 of the content. When button 1530 is selected, the episode list of the corresponding work may be called. The administrator can inspect text information 50 of other episodes of the same work through the episode list.

검수 화면에서는, 컷 이미지 영역(1540)이 표시될 수 있다. 컷 이미지 영역(1540)은 콘텐츠의 선택된 컷과 해당 컷에서 추출된 텍스트를 포함하는 텍스트 정보(1590)와 컷 번호('0')를 포함할 수 있다. 텍스트 정보(1590)는 관리자가 편집 가능하게 구성될 수 있다. 따라서, 텍스트 정보(1590)는 컴퓨터 시스템(100)이 잘못 추출한 텍스트를 수정할 수 있다. 검수 화면에서는, 리모콘 UI가 더 표시될 수 있다. 리모콘 UI는 다른 컷으로 이동하기 위한 UI(1550)(컷 번호를 관리자가 직접 입력할 수 있음), 수정된 텍스트 정보(1590)를 반영하기 위한 UI(1560), 수정된 텍스트 정보(1590)를 더 빠르게(예컨대, 가장 높은 우선순위로) 서비스에 반영하기 위한 UI(1570), 에피소드 리스트로 이동하기 위한 UI(1580)를 포함할 수 있다.On the inspection screen, a cut image area 1540 may be displayed. The cut image area 1540 may include text information 1590 including a selected cut of content and text extracted from the cut, and a cut number ('0'). Text information 1590 may be configured to be editable by an administrator. Accordingly, the text information 1590 can correct text incorrectly extracted by the computer system 100. On the inspection screen, the remote control UI may be further displayed. The remote control UI includes UI (1550) for moving to another cut (the administrator can directly enter the cut number), UI (1560) for reflecting modified text information (1590), and modified text information (1590). It may include a UI 1570 for reflecting the service more quickly (e.g., with the highest priority) and a UI 1580 for moving to the episode list.

전술한 텍스트 정보(50)를 열람 및 검수하기 위한 툴의 기능과 UI는 컴퓨터 시스템(100)에 의한 제어에 따라, 관리자 단말에 제공될 수 있다.The functions and UI of the tool for viewing and inspecting the text information 50 described above may be provided to the administrator terminal under control by the computer system 100.

말하자면, 컴퓨터 시스템(100)은 관리자 단말에서, 콘텐츠의 복수의 컷들 중 관리자에 의해 선택된 제1 컷('0')과 해당 제1 컷('0')으로부터 추출된 대사를 포함하는 텍스트 정보(1590)를 표시시킬 수 있다. 이 때, 표시된 텍스트 정보(1590)를 편집하기 위한 사용자 인터페이스가 관리자 단말에 함께 제공될 수 있다. 또한, 컴퓨터 시스템(100)은 제1 컷('0')으로부터 복수의 컷들 중 다른 컷인 제2 컷으로의 전환을 가능하게 하는 사용자 인터페이스(150)를 더 제공할 수 있다. In other words, the computer system 100 is provided at the administrator terminal, text information including a first cut ('0') selected by the administrator among a plurality of cuts of content and a line extracted from the first cut ('0') ( 1590) can be displayed. At this time, a user interface for editing the displayed text information 1590 may be provided on the administrator terminal. Additionally, the computer system 100 may further provide a user interface 150 that enables switching from the first cut ('0') to the second cut, which is another cut among the plurality of cuts.

이처럼, 실시예에서는, 컴퓨터 시스템(100)에 의해 생성되어 제공되는 텍스트 정보(50)를 관리자 단말을 통해 관리자가 툴을 사용하여 검수할 수 있고, 검수된 텍스트 정보(50)가 콘텐츠를 제공하기 위한 서비스에 반영될 수 있다. As such, in the embodiment, the text information 50 generated and provided by the computer system 100 can be inspected by the administrator using a tool through the administrator terminal, and the inspected text information 50 is used to provide content. can be reflected in services for

또한, 컴퓨터 시스템(100)은 콘텐츠를 소비하는 소비자 단말(160)로부터의 요청에 따라, 텍스트 정보(50)를 소비자 단말(160)에 제공할 수 있다(단계(334)).Additionally, the computer system 100 may provide text information 50 to the consumer terminal 160 in response to a request from the consumer terminal 160 consuming content (step 334).

예컨대, 컴퓨터 시스템(100)은 콘텐츠를 소비하는 소비자 단말(160)로부터의 요청에 따라, 소비자 단말(160)에 텍스트 정보(50)에 대응하는 오디오 정보를 제공할 수 있다. 오디오 정보는 텍스트 정보(50)에 포함된 텍스트를 오디오로 변환한 것일 수 있다. 예컨대, 텍스트 정보(50)에 포함된 텍스트는 TTS (Text to Speech) 기술을 사용하여 오디오로 변환될 수 있다. For example, the computer system 100 may provide audio information corresponding to the text information 50 to the consumer terminal 160 according to a request from the consumer terminal 160 consuming content. The audio information may be text included in the text information 50 converted into audio. For example, text included in the text information 50 may be converted into audio using TTS (Text to Speech) technology.

관련하여, 도 8은 일 예에 따른, 콘텐츠를 소비하는 소비자의 소비자 단말에 콘텐츠와 연관된 텍스트 정보를 제공하는 방법을 나타내는 흐름도이다.In relation to this, FIG. 8 is a flowchart illustrating a method of providing text information associated with content to a consumer terminal of a consumer consuming content, according to an example.

실시예에서, 콘텐츠를 열람하는 소비자 단말(160)은, 텍스트 정보에 기반하여 콘텐츠의 이미지(10)로부터 추출된 텍스트를 오디오로 변환한 오디오 정보를 제공 받을 수 있다. 이러한 소비자 단말(160)에 대한 오디오 정보의 제공은 콘텐츠의 대사에 대한 낭독 서비스의 제공일 수 있다. In an embodiment, the consumer terminal 160 that reads content may receive audio information obtained by converting text extracted from the image 10 of the content into audio based on text information. Provision of audio information to the consumer terminal 160 may be provision of a reading service for lines of content.

단계(810)에서, 컴퓨터 시스템(100)은 소비자 단말(160)로부터 콘텐츠의 열람이 요청됨에 따라, 콘텐츠와 연관된 텍스트 정보(50)를 호출할 수 있다. 예컨대, 소비자 단말(160)은 웹툰 콘텐츠를 열람하기 위한 전용 어플리케이션인 '웹툰 어플리케이션'을 통해 콘텐츠를 열람할 수 있고, 이에 따라, 컴퓨터 시스템(100)(또는, 콘텐츠 서버(150))은 텍스트 정보(50)를 호출할 수 있다.In step 810, the computer system 100 may call text information 50 associated with the content as viewing of the content is requested from the consumer terminal 160. For example, the consumer terminal 160 can view content through a 'webtoon application', a dedicated application for viewing webtoon content, and accordingly, the computer system 100 (or content server 150) provides text information. (50) can be called.

단계(820)에서, 컴퓨터 시스템(100)(또는, 콘텐츠 서버(150))은 콘텐츠의 복수의 컷들 중 소비자 단말이 열람하고 있는 컷을 인식할 수 있다. 예컨대, 컴퓨터 시스템(100)(또는, 콘텐츠 서버(150))은 소비자 단말(160)의 화면에 표시되고 있는 콘텐츠의 컷을 인식하거나, 또는, 소비자 단말(160)의 화면에 표시되고 있는 콘텐츠의 컷이 사용자에 의해 선택(또는 터치)된 경우 이를 인식할 수 있다. 또는, 소비자 단말(160)의 화면에서 콘텐츠의 컷이 특정 위치(예컨대, 중심 영역)에서 표시되는 경우 컴퓨터 시스템(100)(또는, 콘텐츠 서버(150))은 해당 컷을 인식할 수 있다. In step 820, the computer system 100 (or content server 150) may recognize a cut that the consumer terminal is viewing among a plurality of cuts of content. For example, the computer system 100 (or content server 150) recognizes a cut of content displayed on the screen of the consumer terminal 160, or cuts the content displayed on the screen of the consumer terminal 160. When a cut is selected (or touched) by the user, it can be recognized. Alternatively, when a cut of content is displayed at a specific location (eg, central area) on the screen of the consumer terminal 160, the computer system 100 (or content server 150) may recognize the cut.

단계(830)에서, 컴퓨터 시스템(100)(또는, 콘텐츠 서버(150))은 텍스트 정보(50) 중 인식된 컷에 해당하는 부분에 대응하는 오디오 정보를 소비자 단말(160)에서 출력시킬 수 있다. 말하자면, 소비자 단말(160)은 표시되고 있는 콘텐츠 컷에 해당하는 대사를 낭독하는 오디오를 출력할 수 있다. In step 830, the computer system 100 (or content server 150) may output audio information corresponding to a portion of the text information 50 corresponding to the recognized cut from the consumer terminal 160. . In other words, the consumer terminal 160 can output audio that reads lines corresponding to the content cut being displayed.

소비자 단말(160)에서 해당 컷이 더 이상 표시되지 않거나 특정 위치(예컨대, 중심 영역)에서 일정 이상 벗어나는 경우 오디오의 출력은 정지될 수 있다. If the corresponding cut is no longer displayed on the consumer terminal 160 or deviates from a specific location (eg, the center area) by a certain amount, audio output may be stopped.

한편, 실시예에 따라서는, 소비자 단말(160)에서 '낭독기 기능'이 선택되어 먼저 실행될 수 있고, 이에 따라, 전술한 단계들(820 및 830)이 수행될 수 있다. 소비자 단말(160)에서의 컷 전환은 자동으로 이루어질 수도 있다. Meanwhile, depending on the embodiment, the 'reader function' may be selected and executed first in the consumer terminal 160, and accordingly, the above-described steps 820 and 830 may be performed. Cut transition in the consumer terminal 160 may be performed automatically.

관련하여, 도 16은 일 예에 따른, 소비자 단말에 텍스트 정보에 대응하는 오디오 정보를 제공하는 방법을 나타낸다. Relatedly, Figure 16 illustrates a method of providing audio information corresponding to text information to a consumer terminal, according to an example.

도 16에서는, 콘텐츠의 첫 번째 컷이 인식된 경우(또는, '낭독기 기능'이 실행되고 첫 번째 컷이 소비자 단말(160)에서 표시되는 경우), 소비자 단말(160)에 출력되는 오디오 정보를 나타낸다. 도시된 것처럼, 컷으로부터 추출된 텍스트의 낭독에 앞서 먼저 안내 문구("AI를 활용해 ... 부탁 드리겠습니다"; "이제 대사 낭독이 시작됩니다.")와, "작품명" 및 해당 작품의 "회차명"이 먼저 음성으로 출력될 수 있다. 인식된 컷에서의 텍스트에 해당하는 오디오 정보가 출력된 후에는 "다음 컷으로 이동해 주세요."와 같은 안내 문구가 음성으로 출력될 수 있다. In Figure 16, when the first cut of content is recognized (or when the 'reader function' is executed and the first cut is displayed on the consumer terminal 160), audio information output to the consumer terminal 160 is displayed. indicates. As shown, prior to reading the text extracted from the cut, there is first a guidance phrase (“Please use AI to...”; “The reading of the lines will now begin.”), the “title of the work” and the “title” of the work. The “episode name” may be output as a voice first. After audio information corresponding to the text in the recognized cut is output, a guidance phrase such as “Please move to the next cut” may be output as a voice.

도 14 및 도 15를 참조하여 설명된 것처럼 텍스트 정보(50)는 관리자 단말로 제공됨으로써 서비스의 제공을 위해 검수될 수 있다. 검수된 텍스트 정보(50)는 도 8 및 도 16을 참조하여 전술한 것처럼 콘텐츠를 열람하는 소비자 단말(160)에 서비스될 수 있다. As described with reference to FIGS. 14 and 15, text information 50 is provided to the administrator terminal and can be inspected for provision of services. The reviewed text information 50 may be served to the consumer terminal 160 for viewing content as described above with reference to FIGS. 8 and 16 .

이상 도 1 및 도 2를 참조하여 전술된 기술적 특징에 대한 설명은, 도 3, 도 8 및 도 14 내지 도 16에 대해서도 그대로 적용될 수 있으므로 중복되는 설명은 생략한다.The description of the technical features described above with reference to FIGS. 1 and 2 can also be applied to FIGS. 3, 8, and 14 to 16, so overlapping descriptions will be omitted.

도 4는 일 예에 따른, 콘텐츠의 이미지로부터 텍스트를 추출하는 방법을 나타내는 흐름도이다. Figure 4 is a flowchart illustrating a method for extracting text from an image of content, according to an example.

컴퓨터 시스템(100)은 이미지(10)로부터 컷 및 대사 영역을 추출하고, 대사 영역으로부터 텍스트를 추출하도록 미리 훈련된 학습 모델을 사용하여, 이미지(10)로부터 대사를 추출할 수 있다.Computer system 100 may extract cuts and dialogue regions from image 10 and extract dialogue from image 10 using a learning model pre-trained to extract text from the dialogue regions.

학습 모델은 인공지능(AI) 모델로서, 예컨대, 인공신경망 또는 딥러닝 기반의 모델일 수 있다. 학습 모델은 웹툰 콘텐츠와 같은 콘텐츠의 이미지(10)로부터 컷을 추출하도록 미리 훈련된 것일 수 있고, 또한, 각 컷으로부터 각 컷에 포함된 텍스트 중 대사에 해당하는 텍스트를 추출하도록 미리 훈련된 것일 수 있다. 예컨대, 학습 모델은 각 컷으로부터 대사를 포함하는 대사 영역을 검출하고, 검출된 대사 영역으로부터 텍스트를 추출하도록 미리 훈련된 것일 수 있다. 예컨대, 도 4 내지 도 7을 참조하여 후술될 단계들은 이러한 학습 모델을 사용하여 수행될 수 있다. The learning model is an artificial intelligence (AI) model, and may be, for example, an artificial neural network or deep learning-based model. The learning model may be pre-trained to extract cuts from images 10 of content such as webtoon content, and may also be pre-trained to extract text corresponding to dialogue among the text included in each cut from each cut. there is. For example, the learning model may be pre-trained to detect a dialogue region containing dialogue from each cut and extract text from the detected dialogue region. For example, steps to be described later with reference to FIGS. 4 to 7 may be performed using this learning model.

단계(410)에서, 컴퓨터 시스템(100)은 콘텐츠의 이미지(10)에서 복수의 컷들을 검출할 수 있다. At step 410, computer system 100 may detect a plurality of cuts in image 10 of content.

단계(420)에서, 컴퓨터 시스템(100)은 복수의 컷들의 각 컷을 포함하는 각각의 컷 이미지를 생성할 수 있다. 컴퓨터 시스템(100)은 미리 훈련된 학습 모델을 사용하여 이미지(10)로부터 컷들을 검출할 수 있고 각각의 컷을 포함하도록 구성되는 컷 이미지를 생성할 수 있다. 학습 모델은 이미지(10)로부터 컷 이미지를 생성하도록 훈련된 것일 수도 있다. 예컨대, 웹툰 콘텐츠의 이미지(10)의 경우 사각형의 영역 등으로 컷이 구분되어 있을 수 있고, 컴퓨터 시스템(100)은 이러한 구분된 컷을 인식함으로써 콘텐츠의 복수의 컷들을 검출할 수 있다. In step 420, the computer system 100 may generate each cut image including each cut of the plurality of cuts. Computer system 100 can detect cuts from image 10 using a pre-trained learning model and generate a cut image configured to include each cut. The learning model may be trained to generate a cut image from the image 10. For example, in the case of the image 10 of webtoon content, cuts may be divided into square areas, etc., and the computer system 100 can detect a plurality of cuts of the content by recognizing these divided cuts.

관련하여, 도 10a는 일 예에 따른, 콘텐츠의 컷에 대응하는 컷 이미지를 나타낸다. Relatedly, FIG. 10A shows a cut image corresponding to a cut of content, according to an example.

도시된 예시에서, 콘텐트의 이미지(10)에 포함되는 복수의 컷들은 상하 방향으로 스크롤링되는 순서로 이미지(10)에 포함될 수 있다. 이 때, 각 컷을 포함하여 구성되는 각각의 컷 이미지는 각 컷의 상하에 소정의 크기의 공백 영역(1010)을 더 포함하도록 구성될 수 있다. In the illustrated example, a plurality of cuts included in the content image 10 may be included in the image 10 in a scrolling order in the vertical direction. At this time, each cut image including each cut may be configured to further include a blank area 1010 of a predetermined size above and below each cut.

도시된 것처럼, 컷 이미지(1000-1)는 컷에 해당하는 컷 영역(1020)과 컷 영역(1020) 주변의 공백 영역(1010)을 포함할 수 있다. 실시예에 따라서는, 공백 영역(1010)은 컷 영역(1020)의 상하에 뿐만아니라 좌우에도 존재할 수 있다. As shown, the cut image 1000-1 may include a cut area 1020 corresponding to a cut and a blank area 1010 around the cut area 1020. Depending on the embodiment, the blank area 1010 may exist not only above and below the cut area 1020 but also on the left and right sides.

컴퓨터 시스템(100)은 컷 영역(1020)의 주변에 공백 영역(1010)을 포함하도록 컷 이미지(1000-1)를 생성함으로써, 각 컷의 경계에 걸쳐서 위치되는 대사 영역(또는 말풍선)이 컷 이미지(1000-1)에 충분히 포함되도록 할 수 있다. The computer system 100 generates the cut image 1000-1 to include a blank area 1010 around the cut area 1020, so that the dialogue area (or speech bubble) located across the border of each cut is displayed in the cut image. It can be sufficiently included in (1000-1).

말하자면, 컴퓨터 시스템(100)은 이미지(10)로부터 복수의 컷들을 검출하고, 각 컷을 포함하도록 컷 이미지를 생성하기 위해 이미지(10)를 분할 또는 크롭함에 있어서, 컷과 컷 사이의 공백 영역(1010)을 컷 이미지가 포함하도록 할 수 있고, 따라서, 컷과 컷 사이의 컷 라인에 걸쳐있는 대사 영역 또는 말풍선이 컷 이미지에서 누락되는 것을 방지할 수 있다. In other words, the computer system 100 detects a plurality of cuts from the image 10 and divides or crops the image 10 to generate a cut image to include each cut, such as a blank area between cuts ( 1010) can be included in the cut image, and thus, it is possible to prevent the dialogue area or speech bubble spanning the cut line between cuts from being omitted from the cut image.

예컨대, 컷 이미지(1000-1)가 포함하는 공백 영역(1010)은 일정한 크기를 가질 수 있다. 또는, 제1 컷 이미지는 제1 컷의 컷 라인과 그 다음 컷인 제2 컷의 컷 라인의 중간과, 제1 컷의 컷 라인과 그 이전 컷인 제0 컷의 컷 라인의 중간을 자름으로써 생성될 수도 있다. 이 때, 컷 이미지는 컷과 컷 사이의 간격에 따라 상이한 크기를 가질 수 있다. For example, the blank area 1010 included in the cut image 1000-1 may have a certain size. Alternatively, the first cut image may be created by cutting the middle of the cut line of the first cut and the cut line of the second cut, which is the next cut, and the middle of the cut line of the first cut and the cut line of the 0th cut, which is the previous cut. It may be possible. At this time, the cut image may have different sizes depending on the gap between cuts.

한편, 콘텐츠에 따라, 컷은 사각형의 영역 등으로 구분되어 있지 않을 수도 있다. 관련하여, 도 10b는 일 예에 따른, 콘텐츠의 이미지로부터 컷을 추출하는 방법을 나타낸다. Meanwhile, depending on the content, the cut may not be divided into square areas, etc. Relatedly, Figure 10b shows a method of extracting a cut from an image of content, according to an example.

도 10b에서 도시된 것처럼, 컷과 컷은 시각적으로 인식 가능한 컷 라인에 의해 구분되지 않고, 그라디에이션, 단색의 배경, 또는 그 밖의 배경 전환 효과에 의해 구분될 수도 있다. 전술한 학습 모델은 이러한 경우에도 이미지(10)로부터 컷을 검출하고 컷 이미지를 생성할 수 있도록 구성될 수 있다. 예컨대, 컴퓨터 시스템(100)은 컷과 컷 사이의 그라디에이션 영역의 중간을 컷 이미지의 경계로 결정할 수 있다. 또는, 컴퓨터 시스템(100)은 이미지(10) 내에 그라디에이션 또는 단색이 일정 길이 이상 존재할 경우 이를 기준으로 컷 이미지의 경계를 결정할 수 있다. 예컨대, 컴퓨터 시스템(100)은 이미지(10) 내에 그라디에이션 또는 단색이 일정 길이 이상 존재할 경우 그 중간을 컷 이미지의 경계를 결정할 수 있다. 한편, 일정 길이 이상의 그라디에이션 또는 단색은 전술한 공백 영역(1010)과 유사하게 취급될 수 있다. As shown in FIG. 10B, cuts are not distinguished by a visually recognizable cut line, but may be distinguished by a gradient, a solid background, or other background transition effects. The above-described learning model can be configured to detect a cut from the image 10 and generate a cut image even in this case. For example, the computer system 100 may determine the middle of the gradient area between cuts as the boundary of the cut image. Alternatively, the computer system 100 may determine the boundary of the cut image based on the gradient or solid color present in the image 10 over a certain length. For example, the computer system 100 may determine the boundary of the cut image to be in the middle when a gradient or a single color exists in the image 10 over a certain length. Meanwhile, gradients or solid colors over a certain length can be treated similarly to the blank area 1010 described above.

컴퓨터 시스템(100)은 추출된 컷 또는 해당 컷을 포함하는 컷 이미지에 해당 컷의 순서에 따라 순서를 할당할 수 있다. 예컨대, 컴퓨터 시스템(100)은 추출된 컷 또는 해당 컷을 포함하는 컷 이미지에 순번을 할당할 수 있다. 이러한 순서 또는 순번은 텍스트 정보(50)에 포함되는 순서 정보로서 사용될 수 있다. The computer system 100 may assign an order to the extracted cut or a cut image including the cut according to the order of the cut. For example, the computer system 100 may assign a sequence number to the extracted cut or a cut image including the cut. This order or number can be used as order information included in the text information 50.

한편, 컴퓨터 시스템(100)은 콘텐츠의 이미지(10)의 구성에 따라, 말풍선 또는 대사 영역만을 포함하는 영역을 컷으로 추출할 수도 있다. 예컨대, 특정 영역에 말풍선 또는 대사 영역만이 존재하고 그 주변(상하)의 일정 영역이 그라디에이션 또는 단색인 경우가 있을 수 있고, 이 때, 컴퓨터 시스템(100)은 말풍선 또는 대사 영역만이 존재하는 영역을 컷으로 추출할 수 있다. 이러한 컷은 컷 라인을 포함하지 않는다. 컴퓨터 시스템(100)은 말풍선 또는 대사 영역만이 존재하는 영역이 컷으로 추출되는 경우 가상의 컷 라인을 생성할 수 있다. 이러한 가상의 컷 라인을 포함하는 컷에 대해서도 다른 컷과 마찬가지로 순서가 할당될 수 있다. 말하자면, 가상의 컷 라인을 포함하는 컷 또는 컷 이미지 역시 실제 컷 라인을 포함하는 컷 또는 컷 이미지와 마찬가지로 순번이 할당될 수 있다. Meanwhile, the computer system 100 may extract an area containing only a speech bubble or dialogue area as a cut, depending on the configuration of the content image 10. For example, there may be a case where only a speech bubble or dialogue area exists in a specific area, and a certain area around the area (top and bottom) has a gradient or a solid color. In this case, the computer system 100 may display only a speech bubble or dialogue area. The area can be extracted as a cut. These cuts do not include cut lines. The computer system 100 may generate a virtual cut line when an area containing only a speech bubble or dialogue area is extracted as a cut. Cuts including these virtual cut lines may be assigned an order like other cuts. In other words, a cut or cut image including a virtual cut line may also be assigned a sequence number like a cut or cut image including an actual cut line.

단계(430)에서, 컴퓨터 시스템(100)은 복수의 컷들에 대응하는 컷 이미지들로부터 텍스트를 추출할 수 있다. 컴퓨터 시스템(100)은 미리 훈련된 학습 모델을 사용하여 컷 이미지들로부터 텍스트를 추출할 수 있다. 예컨대, 학습 모델은 컷 이미지로부터 대사를 포함하는 대사 영역을 검출하도록 훈련된 것일 수 있고, 컴퓨터 시스템(100)은 텍스트는 이러한 학습 모델을 사용하여 검출된 대사 영역에서 OCR을 사용하여 텍스트를 추출할 수 있다.In step 430, the computer system 100 may extract text from cut images corresponding to a plurality of cuts. The computer system 100 may extract text from cut images using a pre-trained learning model. For example, the learning model may be trained to detect a dialogue area containing dialogue from a cut image, and the computer system 100 may extract text using OCR from the dialogue area detected using this learning model. You can.

아래에서는 단계들(432 내지 436)을 참조하여, 컷 이미지로부 대사 영역을 검출하고 대사 영역으로부터 텍스트를 추출하는 방법을 더 자세하게 설명한다.Below, with reference to steps 432 to 436, a method for detecting a dialogue region from a cut image and extracting text from the dialogue region will be described in more detail.

단계(432)에서, 컴퓨터 시스템(100)은 각각의 컷 이미지에 대해 콘텐츠의 대사를 포함하는 대사 영역을 검출할 수 있다. 컴퓨터 시스템(100)은 미리 훈련된 학습 모델을 사용하여 각각의 컷 이미지로부터 대사 영역을 검출할 수 있다.At step 432, the computer system 100 may detect a dialogue region containing dialogue in the content for each cut image. The computer system 100 may detect the metabolic region from each cut image using a pre-trained learning model.

대사 영역은 콘텐츠의 이미지(10)에 포함된 말풍선, 콘텐츠의 화자 또는 캐릭터에 의한 독백 또는 나레이션을 포함하는 영역(즉, 화자 또는 캐릭터에 의해 발화되는 텍스트이나 말풍선으로 구분되지 않는 영역), 또는 콘텐츠의 설명하는 텍스트(말풍선으로 구분되지 않는 영역으로, 예컨대, 작가에 의해 발화되는 텍스트)를 포함하는 영역일 수 있다. 여기서, 화자 또는 캐릭터는 콘텐츠의 등장인물일 수 있다. 학습 모델은 이와 같은 컷 이미지로부터 대사 영역을 검출하도록 미리 훈련된 것으로서, 다수의 웹툰 콘텐츠가 포함하는 컷들을 포함하는 이미지들을 사용하여 미리 훈련된 것일 수 있다.The dialogue area is a speech bubble included in the image 10 of the content, an area containing a monologue or narration by the speaker or character of the content (i.e., an area not separated by text or speech bubbles uttered by the speaker or character), or the content It may be an area containing explanatory text (an area not separated by speech bubbles, for example, text uttered by the writer). Here, the speaker or character may be a character in the content. The learning model is pre-trained to detect dialogue areas from such cut images, and may be pre-trained using images including cuts included in multiple webtoon contents.

대사 영역에 포함되는 텍스트는 콘텐츠의 스토리를 설명하기 위해 필요한 텍스트로 간주될 수 있다. Text included in the dialogue area can be considered text necessary to explain the story of the content.

관련하여, 도 11a는 일 예에 따른, 컷 이미지로부터 대사 영역을 검출하는 방법을 나타낸다. Relatedly, FIG. 11A shows a method for detecting a metabolic region from a cut image, according to an example.

도 11a에서는 컷 이미지(1100)가 도시되었다. 컴퓨터 시스템(100)은 컷 이미지(1100)로부터 대사 영역(1110)을 검출할 수 있다. 대사 영역(1110)은 말풍선을 나타내는 제1 대사 영역(1110-1) 및 제3 대사 영역(1110-3)과, 말풍선으로 구분되지 않는 대사 영역인 제2 대사 영역(1110-2)을 포함할 수 있다. 제2 대사 영역(1110-2)은 전술한 콘텐츠의 화자 또는 캐릭터에 의한 독백 또는 나레이션을 포함하는 영역 또는 콘텐츠의 설명하는 텍스트를 포함하는 영역일 수 있다. In Figure 11a, a cut image 1100 is shown. The computer system 100 may detect the metabolic region 1110 from the cut image 1100. The metabolism area 1110 may include a first dialogue area 1110-1 and a third dialogue area 1110-3, which represent speech bubbles, and a second metabolism area 1110-2, which is a dialogue area not separated by speech bubbles. You can. The second dialogue area 1110-2 may be an area containing a monologue or narration by the speaker or character of the above-described content, or an area containing text explaining the content.

컴퓨터 시스템(100)은 미리 훈련된 학습 모델을 사용하여 다양한 형태로 존재하는 말풍선을 나타내는 대사 영역으로서 제1 대사 영역(1110-1) 및 제3 대사 영역(1110-3)을 포함하는 대사 영역들을 검출할 수 있다. 또한, 컴퓨터 시스템(100)은 미리 훈련된 학습 모델을 사용하여 말풍선으로는 구분되지 않는 대사를 포함하는 대사 영역인 제2 대사 영역(1110-2)을 포함할 수 있다. 학습 모델은 대사 영역의 위치나 거기에 포함된 텍스트의 길이, 텍스트의 크기, 텍스트의 폰트를 고려하여 (효과음이나 스토리 전달과는 상관 없는 텍스트가 아닌) 대사를 포함하는 제2 대사 영역(1110-2)을 검출하도록 훈련된 것일 수 있다. The computer system 100 uses a pre-trained learning model to create metabolic regions including a first metabolic region 1110-1 and a third metabolic region 1110-3 as metabolic regions representing speech bubbles that exist in various forms. It can be detected. Additionally, the computer system 100 may include a second dialogue area 1110-2, which is a dialogue area containing dialogue that is not distinguished by speech bubbles using a pre-trained learning model. The learning model creates a second dialogue area (1110- It may be trained to detect 2).

한편, 컴퓨터 시스템(100)은 컷 이미지(1100)에 포함된 텍스트 중 대사를 포함하지 않는 비대사 영역(1120)을 식별할 수 있다. 비대사 영역(1120)은 예컨대, 컷 이미지(1100) 내의 오브젝트(캐릭터 등)에 포함된 텍스트를 포함하는 영역(예컨대, 제1 비대사 영역(1120-1), 제3 비대사 영역(1120-3) 및 제4 비대사 영역(1120-4))을 포함할 수 있다. 또한, 비대사 영역(1120)은 예컨대, 컷의 배경에 포함된 무늬나 텍스트를 포함하는 영역(예컨대, 제2 비대사 영역(1120-2))을 포함할 수 있다. 일례로, 컷 이미지(1100)에 포함되는 효과음에 해당하는 텍스트를 포함하는 영역은 비대사 영역(1120)으로 식별될 수 있다. 효과음은 콘텐츠에 포함된 오브젝트의 동작에 따라 발생되는 소리, 효과 등을 텍스트로 표기한 것일 수 있다. Meanwhile, the computer system 100 may identify a non-dialogue area 1120 that does not contain dialogue among the text included in the cut image 1100. The non-communicating area 1120 is, for example, an area containing text included in an object (character, etc.) in the cut image 1100 (e.g., the first non-communicating area 1120-1, the third non-communicating area 1120- 3) and a fourth non-metabolic region (1120-4)). Additionally, the non-communicating area 1120 may include, for example, an area containing a pattern or text included in the background of the cut (for example, the second non-communicating area 1120-2). For example, an area containing text corresponding to a sound effect included in the cut image 1100 may be identified as a non-dialogue area 1120. Sound effects may be a text representation of sounds, effects, etc. generated according to the movement of objects included in the content.

실시예에서는, 예컨대, 전단지와 같은 오브젝트에 포함된 텍스트는 콘텐츠의 스토리 전달과는 관련이 없는 텍스트인 바, 해당 텍스트를 포함하는 영역은 제1 비대사 영역(1120-1)으로 식별될 수 있다. 또한, 캐릭터의 옷에 포함된 텍스트 역시 콘텐츠의 스토리 전달과는 관련이 없는 텍스트인 바, 해당 텍스트를 포함하는 영역은 제3 비대사 영역(1120-3) 및 제4 비대사 영역(1120-4)으로 식별될 수 있다. 또한, 컷의 배경에 표현되는 무늬 또는 텍스트 역시 콘텐츠의 스토리 전달과는 관련이 없는 텍스트인 바, 해당 무늬 또는 텍스트를 포함하는 영역은 제2 비대사 영역(1120-2)으로 식별될 수 있다.In an embodiment, for example, the text included in an object such as a flyer is text that is not related to the story delivery of the content, and the area containing the text may be identified as the first non-metabolic area 1120-1. . In addition, the text included in the character's clothes is also text that is not related to the story delivery of the content, and the areas containing the text are the third non-dialogue area (1120-3) and the fourth non-dialogue area (1120-4). ) can be identified. In addition, since the pattern or text expressed in the background of the cut is also text that is not related to the story delivery of the content, the area containing the pattern or text may be identified as the second non-dialogue area 1120-2.

실시예에서는, 컴퓨터 시스템(100)은 텍스트를 포함하는 영역들 중에서 이러한 비대사 영역(1120)을 배제하고 대사 영역(1110)들 만을 검출해 낼 수 있다. In an embodiment, the computer system 100 may exclude the non-dialogue area 1120 and detect only the dialogue areas 1110 among areas containing text.

관련하여, 도 7은 일 예에 따른, 컷 이미지로부터 대사 영역을 검출하는 방법을 나타내는 흐름도이다. Relatedly, FIG. 7 is a flowchart showing a method of detecting a metabolic region from a cut image, according to an example.

컴퓨터 시스템(100)은, 컷 이미지(1100)로부터 대사 영역을 검출함에 있어서, 단계(710)에서, 각각의 컷 이미지(1100)에서 텍스트를 포함하는 영역들을 검출할 수 있다. 검출된 영역들은 대사 영역(1110) 및 비대사 영역(1120)을 포함할 수 있다. In detecting the dialogue area from the cut image 1100, the computer system 100 may detect areas containing text in each cut image 1100 in step 710. The detected regions may include a metabolic region 1110 and a non-metabolic region 1120.

단계(720)에서, 컴퓨터 시스템(100)은, 검출된 영역들 중에서, 각 컷의 배경에 해당하는 텍스트(즉, 배경에 표현되는 무늬 또는 텍스트), 콘텐츠의 효과음을 나타내는 텍스트 및 콘텐츠의 스토리와는 관련이 없는 것으로 판단된 텍스트(에컨대, 전술한 제1 내지 제4 비대사 영역(1120-1 내지 1120-4)에 포함되는 것과 같은 텍스트)를 포함하는 영역인 비대사 영역(1120)을 식별할 수 있다. In step 720, the computer system 100 selects, among the detected areas, text corresponding to the background of each cut (i.e., a pattern or text expressed in the background), text representing the sound effect of the content, and a story of the content. is a non-communicable area 1120, which is an area containing text determined to be irrelevant (e.g., the same text included in the above-described first to fourth non-communicable areas 1120-1 to 1120-4) can be identified.

단계(730)에서, 컴퓨터 시스템(100)은, 영역들 중에서 식별된 비대사 영역(1120)을 배제한 영역들을 대사를 포함하는 대사 영역(1110)으로서 검출할 수 있다. 한편, 대사 영역(1110)을 식별하기 위한 전술한 학습 모델은 이러한 대사 영역(1110)과 비대사 영역(1120)을 구분하도록 미리 훈련된 것일 수 있다. In step 730, the computer system 100 may detect regions excluding the identified non-metabolic region 1120 among the regions as metabolic regions 1110 containing metabolism. Meanwhile, the above-described learning model for identifying the metabolic region 1110 may be trained in advance to distinguish the metabolic region 1110 and the non-metabolic region 1120.

한편, 실시예에 따라서는 단계들(720 및 730)에서 설명한 것과는 달리, 컴퓨터 시스템(100)은, 단계(710)에서 검출된 영역들 중에서 비대사 영역(1120)을 식별하지 않고 바로 대사 영역(1110)을 검출할 수도 있다. Meanwhile, depending on the embodiment, unlike what is described in steps 720 and 730, the computer system 100 does not identify the non-metabolic region 1120 among the regions detected in step 710, but directly identifies the metabolic region ( 1110) can also be detected.

아래에서는, 검출된 대사 영역으로부터 텍스트를 추출하는 방법을 더 자세하게 설명한다.Below, we explain in more detail how to extract text from the detected dialogue region.

단계(434)에서, 컴퓨터 시스템(100)은 검출된 대사 영역별로 대사 영역에 포함된 텍스트를 OCR 을 사용하여 추출할 수 있다. 말하자면, 컴퓨터 시스템(100)은 각 대사 영역에 대해 텍스트를 추출할 수 있다. 대사 영역별로 대사 영역에 포함된 텍스트가 추출됨으로써, OCR을 통해 일 대사 영역에 포함되지는 않으나 컷 이미지의 동일한 라인에 포함되는 텍스트(또는 동일한 라인에 위치하는 다른 대사 영역에 포함된 텍스트)가 잘못 인식되는 것이 방지될 수 있다. 하나의 대사 영역에 포함되는 대사는 하나의 의미를 이루는 대사를 구성하는 바, 대사 영역별로 대사 영역에 포함된 텍스트가 추출됨으로써, 추출된 텍스트는 콘텐츠의 스토리를 보다 정확하게 설명할 수 있다. 말하자면, 대사 영역별로 대사 영역에 포함된 텍스트를 추출하는 것을 통해 텍스트가 나타내는 대사의 순서의 정확도가 보장될 수 있다. In step 434, the computer system 100 may extract text included in the dialogue region for each detected dialogue region using OCR. That is, computer system 100 can extract text for each speech region. As the text contained in the dialogue area is extracted for each dialogue area, text that is not included in a dialogue area but is included in the same line of the cut image (or text contained in another dialogue area located on the same line) is incorrectly detected through OCR. Recognition can be prevented. The lines included in one dialogue area constitute lines that form one meaning, and by extracting the text included in the dialogue area for each dialogue area, the extracted text can more accurately explain the story of the content. In other words, the accuracy of the order of dialogue represented by the text can be guaranteed by extracting the text included in the dialogue area for each dialogue area.

관련하여, 도 11b는 일 예에 따른, 말풍선(또는, 가상의 말풍선)인 대사 영역으로부터 텍스트를 검출하는 방법을 나타낸다. Relatedly, FIG. 11B shows a method of detecting text from a dialogue area that is a speech bubble (or a virtual speech balloon), according to an example.

도시된 것처럼, 인식된 대사 영역(1110-2)에 대해 OCR을 통해 텍스트가 추출될 수 있다. 도시된 대사 영역(1110-2)은 FFF..., GGG..., HHH...의 총 3개의 라인들(1140)을 포함할 수 있다. 텍스트의 추출된 라인들의 각각은 각 라인이 해당 대사 영역(1110-2)에서 몇 번째 행에 해당하는지를 나타내는 순서 정보와 연관될 수 있다. 따라서, 텍스트 정보(50)는 이러한 순서 정보를 포함하게 될 수 있다. As shown, text can be extracted through OCR for the recognized dialogue region 1110-2. The depicted metabolic region 1110-2 may include a total of three lines 1140: FFF..., GGG..., and HHH.... Each of the extracted lines of text may be associated with order information indicating which line each line corresponds to in the corresponding dialogue area 1110-2. Accordingly, text information 50 may include such order information.

말풍선으로 구분되는 대사 영역들(1110-1, 1110-3)에 대해서도 유사한 방식으로 텍스트가 추출될 수 있다.Text can be extracted in a similar manner for the dialogue areas 1110-1 and 1110-3 separated by speech bubbles.

한편, 도시된 예시에서의 대사 영역(1110-2)은 전술한 독백 또는 나레이션을 포함하는 제1 영역 또는 콘텐츠의 설명하는 텍스트를 포함하는 제2 영역에 해당하는 대사 영역일 수 있다. 컴퓨터 시스템(100)은 이러한 제1 영역 또는 제2 영역에 대해서는, 제1 영역 또는 제2 영역에 대응하는 가상의 말풍선(1130)을 생성할 수 있다. 가상의 말풍선(1130)은 대사 영역에 대해 순서 정보를 부여하기 위해 생성되는 것일 수 있다. Meanwhile, the dialogue area 1110-2 in the illustrated example may be a dialogue area corresponding to the first area including the above-described monologue or narration or the second area including text explaining the content. The computer system 100 may generate a virtual speech bubble 1130 corresponding to the first or second area. The virtual speech bubble 1130 may be created to provide order information for the dialogue area.

말하자면, 후술될 텍스트 정보(50)가 포함하는 순서 정보는, 검출된 대사 영역에 해당하는 말풍선 및 가상의 말풍선을 포함하는 말풍선들의 이미지(10) 내에서의 순서에 기반하여, 검출된 대사 영역별로 추출된 텍스트가 말풍선들 중 어느 것으로부터 추출된 것인지에 대한 정보를 포함하게 될 수 있다. 즉, 순서 정보에 포함되는 대사 영역의 순번은 말풍선으로 구분되는 대사 영역과 말풍선으로 구분되지 않는 대사 영역 모두에 구분 없이 순서대로 할당될 수 있다. In other words, the order information included in the text information 50, which will be described later, is based on the order in the image 10 of speech balloons including a speech balloon corresponding to the detected dialogue area and a virtual speech balloon, for each detected dialogue area. The extracted text may include information about which of the speech bubbles it was extracted from. In other words, the sequence number of the dialogue area included in the order information can be sequentially assigned to both the dialogue area separated by speech bubbles and the dialogue area not separated by speech bubbles.

단계(436)에서, 컴퓨터 시스템(100)은 검출된 대사 영역별로 추출된 텍스트에 기반하여 텍스트 정보(50)를 생성할 수 있다. In step 436, the computer system 100 may generate text information 50 based on text extracted for each detected dialogue region.

텍스트 정보(50)는 검출된 대사 영역별로 추출된 텍스트가 어느 대사 영역으로부터 및 어느 컷으로부터 추출된 것인지에 대한 정보를 순서 정보로서 포함할 수 있다. 또한, 순서 정보는 검출된 대사 영역별로 추출된 텍스트의 해당 대사 영역 내에서의 행(row) 정보를 더 포함하도록 구성될 수 있다. The text information 50 may include information about which dialogue region and which cut the text extracted for each detected dialogue region was extracted as order information. Additionally, the order information may be configured to further include row information within the corresponding dialogue region of the text extracted for each detected dialogue region.

예컨대, 텍스트 정보(50)에 포함되는 순서 정보는, 일례로, 추출된 텍스트가 어떠한 컷에서 추출된 것인지를 나타내는 정보(예컨대, 컷 N, N은 정수), 어떠한 대사 영역에서 추출된 것인지를 나타내는 정보(예컨대, 대사 영역 K, K는 정수), 및 추출된 텍스트의 라인이 몇 번째 행(row)인지를 나타내는 정보(예컨대, [R], R은 정수)를 포함할 수 있다.For example, order information included in the text information 50 may include, for example, information indicating which cut the extracted text was extracted from (e.g., cut N, N is an integer), and information indicating which dialogue region it was extracted from. It may include information (e.g., dialogue area K, K is an integer), and information indicating which row of the extracted text line is (e.g., [R], R is an integer).

일례로, 콘텐츠의 이미지(10)의 복수의 컷들의 각각에는, 이미지(10) 내에서 상하 방향으로 상측에 있을수록 앞서고, 상하 방향으로 동일한 위치에서는 좌측 또는 우측에 있을수록 순번이 앞서게 되는 제1 순번이 할당될 수 있다.For example, in each of the plurality of cuts of the image 10 of the content, the first cut is earlier in the image 10 as it is on the upper and lower side, and in the same position in the vertical direction, the first cut is earlier in the left or right side. A turn number may be assigned.

또한, 각각의 컷 이미지에서 검출되는 대사 영역의 각각에는, 각각의 컷 이미지 내에서 상하 방향으로 상측에 있을수록 앞서고, 상하 방향으로 동일한 위치에서는 좌측 또는 우측에 있을수록 순번이 앞서게 되는 제2 순번이 할당될 수 있다. In addition, in each of the metabolic regions detected in each cut image, there is a second order number that is earlier in the upper and lower direction in each cut image, and is earlier in the left or right position at the same position in the upper and lower direction. can be assigned.

또한, 검출된 대사 영역별로 추출된 텍스트의 각 라인에는, 상하 방향으로 상측에 있을수록 앞서는 제3 순번이 행 정보로서 할당될 수 있다. Additionally, each line of text extracted for each detected dialogue region may be assigned a third order number, which is earlier as it is located at the top in the vertical direction, as row information.

텍스트 정보(50)는, 순서 정보로서 대사 영역의 각각에서 추출된 텍스트에 대해 상기 제1 순번, 상기 제2 순번 및 상기 제3 순번을 포함할 수 있다. 예컨대, 도 1을 참조하여 전술한 것처럼, 텍스트 정보(50)에는 추출된 텍스트가 포함되는 컷의 컷 번호(제1 순번)와 대사 영역의 번호(제2 순번)와 텍스트의 각 라인의 번호(제3 순번)이 포함될 수 있다. The text information 50 may include the first order number, the second order number, and the third order number for the text extracted from each of the dialogue areas as order information. For example, as described above with reference to FIG. 1, the text information 50 includes the cut number (first sequence) of the cut containing the extracted text, the number of the dialogue area (second sequence), and the number of each line of the text ( 3rd turn) may be included.

관련하여, 도 12는 일 예에 따른, 콘텐츠의 컷들과 각 컷에 포함된 대사 영역의 순서를 결정하는 방법을 나타낸다.In relation to this, Figure 12 illustrates a method of determining the order of cuts of content and dialogue regions included in each cut, according to an example.

도 12에서는, 웹툰 콘텐츠의 이미지(1200)에 포함된 컷들과 대사 영역들에 순서 정보로서 순번이 할당되는 일 예가 도시되었다. In FIG. 12 , an example in which a sequence number is assigned as order information to cuts and dialogue regions included in an image 1200 of webtoon content is shown.

도시된 것처럼, 상측에 있을수록 또한 좌측에 있을수록 더 높은 컷 번호(C1 -> C5)(제1 순번)가 할당될 수 있다. 마찬가지로, 상측에 있을수록 또한 좌측에 있을수록 더 높은 대사 영역 번호(1 -> 6)(제2 순번)가 할당될 수 있다. 대사 영역 번호는 대사 영역으로 식별된 말풍선에 대해 할당되는 것일 수 있으나, 도시된 것처럼, 말풍선으로 구분되지 않는 대사 영역에 대해서도 할당될 수 있다(6번 대사 영역 참조). 이 때, 컴퓨터 시스템은 말풍선으로 구분되지 않는 대사 영역에 대응하는 가상의 말풍선을 생성할 수 있고, 이러한 가상의 말풍선에 대사 영역 번호(6)을 할당할 수 있다.As shown, the cut number (C1 -> C5) (first order number) that is higher on the upper and left side may be assigned. Likewise, the higher it is on the top and the left, the higher the metabolic region number (1 -> 6) (second order number) can be assigned. The dialogue area number may be assigned to a speech bubble identified as a dialogue area, but as shown, it may also be assigned to a dialogue area that is not distinguished by a speech bubble (see dialogue area no. 6). At this time, the computer system can generate a virtual speech bubble corresponding to a dialogue area that is not divided into speech bubbles, and assign a dialogue area number (6) to this virtual speech bubble.

한편, 도시된 예시와는 달리, 실시예에 따라서는 대사 영역 번호는 전체 컷들에 대해서가 아니라 각 컷별로 순서가 매겨질 수도 있다. Meanwhile, unlike the illustrated example, depending on the embodiment, the dialogue region numbers may be ordered for each cut rather than for all cuts.

또한, 도시된 예시에서와는 달리, 상측에 있을수록 또한 우측에 있을수록 더 높은 컷 번호(제1 순번)가 할당될 수 있다. 마찬가지로, 상측에 있을수록 또한 우측에 있을수록 더 높은 대사 영역 번호(제2 순번)가 할당될 수 있다. 대사 영역 번호는 대사 영역으로 식별된 말풍선에 대해 할당되는 것일 수 있다. Additionally, unlike the illustrated example, a higher cut number (first order number) may be assigned as the cut number is located on the upper or right side. Likewise, the higher the location on the upper and right side, the higher the metabolic region number (second order number) may be assigned. The dialogue area number may be assigned to a speech bubble identified as a dialogue area.

말하자면, 국가별 또는 문화권 별로 콘텐츠(예컨대, 책, 만화, 글 등)를 읽는 방향이 상이할 수 있고, 전술한 제1 순번 및 제2 순번은 상기의 국가 또는 문화권에서의 콘텐츠를 읽는 방향에 따라 결정될 수 있다. In other words, the direction of reading content (e.g., books, comics, articles, etc.) may be different for each country or cultural area, and the above-mentioned first and second order numbers may vary depending on the direction of reading content in the country or cultural area. can be decided.

도 12에서 도시된 컷들과 대사 영역들에 대한 순서 정보에 따라, 최종적으로 생성되는 텍스트 정보(50)에 포함되는 순서 정보가 결정될 수 있다. According to the order information for cuts and dialogue areas shown in FIG. 12, order information included in the finally generated text information 50 may be determined.

전술한 단계들(410 내지 430)을 참조하여 전술한 것처럼, 실시예에서는, 컴퓨터 시스템(100)이 이미지(10)로부터 바로 텍스트를 추출하는 것이 아니라, 오브젝트 디텍션을 통해 이미지(10)로부터 먼저 컷을 검출하고 컷을 포함하는 컷 이미지로부터 대사 영역을 검출하며; 그 다음으로 OCR을 사용하여 대사 영역별로 텍스트를 대사 영역으로부터 추출할 수 있다.As described above with reference to steps 410 to 430, in an embodiment, computer system 100 does not extract text directly from image 10, but first cuts text from image 10 through object detection. detecting and detecting a metabolic region from the cut image including the cut; Next, OCR can be used to extract text from the dialogue area by dialogue area.

이에 따라, 실시예에서는 '대사'에 해당하지 않는 불필요한 텍스트가 이미지(10)로부터 추출되지 않고, 추출된 텍스트의 대사의 순서의 정확도가 보장될 수 있다. Accordingly, in the embodiment, unnecessary text that does not correspond to 'dialogue' is not extracted from the image 10, and the accuracy of the order of dialogue in the extracted text can be guaranteed.

이상 도 1 내지 도 3, 도 8 및 도 14 내지 도 16을 참조하여 전술된 기술적 특징에 대한 설명은, 도 4, 도 7 및 도 10 내지 도 12에 대해서도 그대로 적용될 수 있으므로 중복되는 설명은 생략한다.The description of the technical features described above with reference to FIGS. 1 to 3, 8, and 14 to 16 can also be applied to FIGS. 4, 7, and 10 to 12, so overlapping descriptions will be omitted. .

도 5는 일 예에 따른, 컷 이미지(들)로부터 추출된 대사 영역들을 통합하여 통합 대사 영역 이미지를 생성하고, 통합 대사 영역 이미지로부터 텍스트를 추출하는 방법을 나타내는 흐름도이다. FIG. 5 is a flowchart illustrating a method of generating an integrated dialogue region image by integrating dialogue regions extracted from cut image(s) and extracting text from the integrated dialogue region image, according to an example.

도 5를 참조하여, 컷 이미지로부터 대사 영역을 검출하는 방법에 대해 더 자세하게 설명한다. Referring to FIG. 5, the method for detecting a metabolic region from a cut image will be described in more detail.

단계(510)에서, 컴퓨터 시스템(100)은 컷 이미지들로부터 검출된 대사 영역들을 통합하여 하나의 통합 대사 영역 이미지를 생성할 수 있다. In step 510, the computer system 100 may integrate the metabolic regions detected from the cut images to generate one integrated metabolic region image.

단계(520)에서, 컴퓨터 시스템(100)은 생성된 통합 대사 영역 이미지에 포함된 대사 영역들에 대해, 대사 영역별로 각 대사 영역에 포함된 텍스트를 OCR 을 사용하여 추출할 수 있다.In step 520, the computer system 100 may extract text included in each metabolic region for each metabolic region using OCR for the metabolic regions included in the generated integrated metabolic region image.

관련하여, 도 19는 일 예에 따른, 통합 대사 영역 이미지를 나타낸다. Relatedly, Figure 19 shows an integrated metabolic region image, according to one example.

도시된 것처럼, 통합 대사 영역 이미지(1900)는 복수의 대사 영역들(예컨대, 복수의 말풍선들)을 포함할 수 있다. 예컨대, 컴퓨터 시스템(100)은 복수의 컷 이미지들로부터 검출된 대사 영역들을 통합하여 하나의 통합 대사 영역 이미지(1900)를 생성할 수 있다. 실시예에 따라서는, 컴퓨터 시스템(100)은 하나의 컷 이미지로부터 검출된 대사 영역들을 통합하여 통합 대사 영역 이미지(1900)를 생성할 수도 있다. 통합 대사 영역 이미지(1900)는 컷 이미지로부터 대사 영역이 아닌 오브젝트를 포함하는 영역과, 공백 영역을 제외한 것으로서 대사 영역들만을 포함할 수 있다. 일례로, 컴퓨터 시스템은 컷 이미지에서 대사 영역(말풍선 등)을 마스킹할 수 있고, 마스킹된 대사 영역에 해당하는 이미지를 잘라 내어 통합함으로써 통합 대사 영역 이미지(1900)를 생성할 수 있다. 이에 따라, 하나의 이미지인 대사 영역 이미지(1900)에 대해 거기에 포함된 대사 영역별로 각 대사 영역에 포함된 텍스트가 OCR 을 사용하여 추출될 수 있다. 이에 따라, 대사 영역별로 각 대사 영역에 포함된 텍스트를 추출하는 작업에 있어서, 다수의 이미지들을 처리할 필요 없이 더 적은 수의 대사 영역 이미지(1900)만을 처리하는 것을 통해 각 대사 영역에 포함된 텍스트를 추출할 수 있고, 따라서, 이미지 처리에 사용되는 네트워크 리소스 및 텍스트 추출에 사용되는 리소스를 절약할 수 있다. 일례로, 검출된 대사 영역을 대사 영역 이미지(1900)로 재구성하여 대사 영역으로부터 텍스트를 추출하는 것은 검출된 대사 영역을 대사 영역 이미지(1900)로 재구성하지 않고 각각의 컷 이미지를 처리하여 대사 영역으로부터 텍스트를 추출하는 것(즉, 일반적인 직렬 처리)에 비해 약 11배 작업 속도를 개선할 수 있다. As shown, the integrated dialogue area image 1900 may include a plurality of dialogue areas (eg, a plurality of speech bubbles). For example, the computer system 100 may generate one integrated metabolic region image 1900 by integrating metabolic regions detected from a plurality of cut images. Depending on the embodiment, the computer system 100 may generate an integrated metabolic region image 1900 by integrating metabolic regions detected from one cut image. The integrated dialogue area image 1900 may include only dialogue areas by excluding the area containing objects other than the dialogue area and the blank area from the cut image. For example, the computer system can mask a dialogue area (speech balloon, etc.) in a cut image, and create an integrated dialogue area image 1900 by cutting and integrating images corresponding to the masked dialogue area. Accordingly, for the dialogue area image 1900, which is one image, the text included in each dialogue area can be extracted using OCR. Accordingly, in the task of extracting the text contained in each dialogue area for each dialogue area, the text contained in each dialogue area is processed by processing only a smaller number of dialogue area images (1900) without the need to process a large number of images. can be extracted, and thus, the network resources used for image processing and the resources used for text extraction can be saved. For example, extracting text from a dialogue region by reconstructing the detected dialogue region into a dialogue region image 1900 is performed by processing each cut image without reconstructing the detected dialogue region into a dialogue region image 1900. Compared to extracting text (i.e. regular serial processing), it can improve operation speed by about 11 times.

한편, 다른 실시예에 있어서, 컴퓨터 시스템(100)은 컷 이미지에 대한 대사 영역(말풍선 등)의 인식과 대사 영역별 텍스트 추출의 작업을 병렬로 수행할 수도 있다. 말하자면, 컴퓨터 시스템(100)은 대사 영역(말풍선 등)의 인식하기 위한 오브젝트 디텍션 작업과 각 대사 영역으로부터 텍스트를 추출하기 위한 문자 인식 작업을 병렬로 수행할 수 있다. 예컨대, 컴퓨터 시스템(100)은 컷 이미지로부터 대사 영역을 검출할 수 있고 검출된 대사 영역으로부터 텍스트를 추출할 수 있으며, 검출된 대사 영역으로부터 텍스트를 추출하는 동안 다른 대사 영역을 검출할 수 있다. 이와 같이 대사 영역의 검출의 처리와 텍스트 추출의 처리를 병렬로 수행하는 것은, 대사 영역의 검출의 처리와 텍스트 추출의 처리를 직렬로 처리하는 것에 비해 작업 속도를 개선시킬 수 있다. 예컨대, 이러한 병렬 처리는 직렬 처리에 비해 작업 속도를 2 내지 5배 향상시킬 수 있다. Meanwhile, in another embodiment, the computer system 100 may perform the tasks of recognizing dialogue areas (speech balloons, etc.) for a cut image and extracting text for each dialogue area in parallel. In other words, the computer system 100 can perform an object detection task to recognize dialogue areas (speech balloons, etc.) and a character recognition task to extract text from each dialogue area in parallel. For example, computer system 100 may detect a dialogue region from a cut image, extract text from the detected dialogue region, and detect other dialogue regions while extracting text from the detected dialogue region. In this way, performing the dialogue area detection process and text extraction process in parallel can improve work speed compared to performing the dialogue area detection process and text extraction process serially. For example, such parallel processing can improve work speed by 2 to 5 times compared to serial processing.

다만, 상기의 병렬 처리에 있어서, 대사 영역의 검출에 비해 텍스트 추출(인식)이 많은 시간이 소요되는 바, 텍스트 추출에서 병목이 발생할 수 있다. 이에 따라, 상기의 병렬 처리는 전술한 검출된 대사 영역을 대사 영역 이미지(1900)로 재구성하여 대사 영역으로부터 텍스트를 추출하는 것에 비해서는 작업 속도가 늦어지게 될 수 있다. 즉, 검출된 대사 영역을 대사 영역 이미지(1900)로 재구성하여 대사 영역으로부터 텍스트를 추출함으로써 텍스트 추출에 소요되는 시간을 줄일 수 있다. However, in the above parallel processing, text extraction (recognition) takes more time than detection of the dialogue region, so a bottleneck may occur in text extraction. Accordingly, the above-mentioned parallel processing may be slower than the operation of extracting text from the dialogue region by reconstructing the above-described detected dialogue region into the dialogue region image 1900. In other words, the time required for text extraction can be reduced by reconstructing the detected dialogue region into a dialogue region image 1900 and extracting text from the dialogue region.

이상 도 1 내지 도 4, 도 7, 도 8, 도 10 내지 도 12 및 도 14 내지 도 16을 참조하여 전술된 기술적 특징에 대한 설명은, 도 5 및 도 19에 대해서도 그대로 적용될 수 있으므로 중복되는 설명은 생략한다.The description of the technical features described above with reference to FIGS. 1 to 4, 7, 8, 10 to 12, and 14 to 16 can also be applied to FIGS. 5 and 19, so the description is redundant. is omitted.

도 6은 일 예에 따른, 추출된 텍스트 및 추출된 텍스트와 연관된 순서 정보를 포함하는 텍스트 정보를 생성하는 방법을 나타내는 흐름도이다.FIG. 6 is a flowchart illustrating a method of generating text information including extracted text and order information associated with the extracted text, according to one example.

도 4 및 도 11b를 참조하여 전술한 것처럼, 단계(610)에서, 컴퓨터 시스템(100)은 검출된 대사 영역이 전술한 독백 또는 나레이션을 포함하는 제1 영역 또는 콘텐츠의 설명하는 텍스트를 포함하는 제2 영역에 해당하는 대사 영역인 경우에 있어서, 제1 영역 또는 제2 영역에 대응하는 가상의 말풍선(1130)을 생성할 수 있다. 가상의 말풍선(1130)은 대사 영역에 대해 순서 정보를 부여하기 위해 생성되는 것일 수 있다. As described above with reference to FIGS. 4 and 11B, at step 610, computer system 100 determines that the detected area of dialogue includes a first region or a first region containing the above-described monologue or narration or a second region containing descriptive text of the content. In the case of a dialogue area corresponding to area 2, a virtual speech bubble 1130 corresponding to the first area or the second area can be created. The virtual speech bubble 1130 may be created to provide order information for the dialogue area.

단계(620)에서, 컴퓨터 시스템(100)은 이러한 생성된 가상의 말풍선을 고려한 순서 정보를 포함하는 텍스트 정보(50)를 생성할 수 있다. 말하자면, 텍스트 정보(50)가 포함하는 순서 정보는, 검출된 대사 영역에 해당하는 말풍선 및 가상의 말풍선을 포함하는 말풍선들의 이미지(10) 내에서의 순서에 기반하여, 검출된 대사 영역별로 추출된 텍스트가 말풍선들 중 어느 것으로부터 추출된 것인지에 대한 정보를 포함하게 될 수 있다. 즉, 순서 정보에 포함되는 대사 영역의 순번은 말풍선으로 구분되는 대사 영역과 말풍선으로 구분되지 않는 대사 영역 모두에 구분 없이 순서대로 할당될 수 있다. In step 620, the computer system 100 may generate text information 50 including order information considering the generated virtual speech bubbles. In other words, the order information included in the text information 50 is extracted for each detected dialogue area based on the order in the image 10 of speech balloons including a speech balloon and a virtual speech balloon corresponding to the detected dialogue area. It may include information about which of the speech bubbles the text was extracted from. In other words, the sequence number of the dialogue area included in the order information can be sequentially assigned to both the dialogue area separated by speech bubbles and the dialogue area not separated by speech bubbles.

이상 도 1 내지 도 5, 도 7, 도 8, 도 10 내지 도 12, 도 14 내지 도 16 및 도 19를 참조하여 전술된 기술적 특징에 대한 설명은, 도 6에 대해서도 그대로 적용될 수 있으므로 중복되는 설명은 생략한다.The description of the technical features described above with reference to FIGS. 1 to 5, 7, 8, 10 to 12, 14 to 16, and 19 can also be applied to FIG. 6, so the description is redundant. is omitted.

도 9는 일 예에 따른, 콘텐츠의 업데이트 또는 삭제에 따라 텍스트를 재추출하여 텍스트 정보를 생성하거나 텍스트 정보를 삭제하는 방법을 나타내는 흐름도이다.FIG. 9 is a flowchart illustrating a method of generating text information or deleting text information by re-extracting text according to update or deletion of content, according to an example.

콘텐츠는 예컨대, 웹툰 콘텐츠와 같이 소비자가 열람 가능한 콘텐츠일 수 있다. 이러한 콘텐츠는 콘텐츠 서버(150)에 업로드된 후 소비자에 대해 서비스될 수 있다. The content may be content that consumers can view, such as webtoon content, for example. Such content may be uploaded to the content server 150 and then served to consumers.

콘텐츠는 그 작가 또는 콘텐츠의 관리자에 의해 내용 중 적어도 일부가 삭제, 변경 또는 추가되어 업데이트될 수 있고, 삭제될 수도 있다. 콘텐츠가 업데이트되는 경우 업데이트된 콘텐츠가 콘텐츠 서버(150)에 업로드될 수 있다. Content may be updated or deleted by deleting, changing, or adding at least part of the content by the author or the administrator of the content. When content is updated, the updated content may be uploaded to the content server 150.

단계(910)에서, 컴퓨터 시스템(100)은 콘텐츠 서버(150)에 대한 콘텐츠의 업데이트 여부 및 삭제 여부를 모니터링할 수 있다. 예컨대, 컴퓨터 시스템(100)은 주기적으로(예컨대, 1시간) 콘텐츠 서버(150)를 모니터링할 수 있고, 이에 따라, 콘텐츠 서버(150)에서 기 업로드된 콘텐츠의 업데이트 및 삭제를 식별할 수 있다. 또는, 컴퓨터 시스템(100)은 콘텐츠 서버(150)로부터 콘텐츠가 업데이트 및 삭제된 것을 통지 받을 수 있고, 이에 따라 콘텐츠의 업데이트 여부 및 삭제 여부를 식별할 수도 있다. At step 910, computer system 100 may monitor whether content on content server 150 is updated or deleted. For example, computer system 100 may periodically (e.g., once an hour) monitor content server 150 and thereby identify updates and deletions of content previously uploaded to content server 150. Alternatively, the computer system 100 may receive notification from the content server 150 that content has been updated or deleted, and may identify whether the content has been updated or deleted accordingly.

단계(920)에서, 컴퓨터 시스템(100)은 콘텐츠의 업데이트가 식별되면, 업데이트된 콘텐츠에 포함된 이미지로부터 텍스트를 추출할 수 있다. 말하자면, 컴퓨터 시스템(100)은 업데이트된 콘텐츠를 텍스트 추출 대상인 콘텐츠로 재식별하여 업데이트된 콘텐츠의 이미지로부터 텍스트를 재추출할 수 있으며, 재추출된 텍스트를 포함하는 텍스트 정보를 재생성할 수 있다. 한편, 실시예에 따라서는, 컴퓨터 시스템(100)은 콘텐츠의 업데이트된 부분을 식별할 수 있고, 텍스트의 재추출은 이러한 식별된 업데이트된 부분과 관련하여서만 수행될 수 있다.At step 920, if an update to the content is identified, the computer system 100 may extract text from the image included in the updated content. In other words, the computer system 100 can re-identify the updated content as content subject to text extraction, re-extract text from the image of the updated content, and regenerate text information including the re-extracted text. Meanwhile, in some embodiments, computer system 100 may identify updated portions of content, and re-extraction of text may be performed only with respect to these identified updated portions.

단계(930)에서, 컴퓨터 시스템(100)은 콘텐츠의 삭제가 식별되면, 콘텐츠와 연관된 텍스트 정보(50)를 삭제할 수 있다. At step 930, computer system 100 may delete text information 50 associated with the content once deletion of the content is identified.

한편, 단계들(920 및 930)은, 도 15를 참조하여 전술한 텍스트 정보(50)의 업데이트 가능 여부를 설정하기 위한 제3 기능(1450)이 OFF로 설정된 경우에만 수행될 수 있다. 말하자면, 상기 제3 기능(1450)이 ON으로 설정된 경우에는, 기 등록된 텍스트 정보의 업데이트가 금지될 수 있고, 따라서, 컴퓨터 시스템(100)은 콘텐츠가 업데이트되거나 혹은 삭제되더라도 텍스트 정보를 변경 또는 삭제하지 않을 수 있다.Meanwhile, steps 920 and 930 can be performed only when the third function 1450 for setting whether the text information 50 described above with reference to FIG. 15 can be updated is set to OFF. In other words, when the third function 1450 is set to ON, updating of pre-registered text information may be prohibited, and therefore, the computer system 100 changes or deletes text information even if the content is updated or deleted. You may not.

이처럼, 실시예에서는 콘텐츠 서버(150)에서의 콘텐츠의 업데이트 및 삭제를 고려하여 텍스트 정보가 재생성되거나 삭제됨으로써, 텍스트 정보는 콘텐츠의 업데이트 및 삭제를 빠르게 반영할 수 있다. As such, in the embodiment, the text information is regenerated or deleted in consideration of the update and deletion of content in the content server 150, so that the text information can quickly reflect the update and deletion of content.

이상 도 1 내지 도 8, 도 10 내지 도 12, 도 14 내지 도 16 및 도 19를 참조하여 전술된 기술적 특징에 대한 설명은, 도 9에 대해서도 그대로 적용될 수 있으므로 중복되는 설명은 생략한다.The description of the technical features described above with reference to FIGS. 1 to 8, 10 to 12, 14 to 16, and 19 can also be applied to FIG. 9, so overlapping descriptions will be omitted.

도 13은 일 예에 따른, 텍스트 정보를 제공하는 방법을 나타낸다. Figure 13 shows a method of providing text information, according to an example.

도 13에서는, 텍스트의 추출 대상이 되는 콘텐츠의 이미지로부터 추출된 컷 이미지들(또는 컷들)(1310)과, 컷 이미지들(1310)로부터 추출된 텍스트를 포함하는 텍스트 정보(1320)가 도시되었다. In FIG. 13 , cut images (or cuts) 1310 extracted from images of content from which text is to be extracted and text information 1320 including text extracted from the cut images 1310 are shown.

컴퓨터 시스템(100)은 컷 이미지들(1310)의 대사 영역들을 검출할 수 있고, 대사 영역별로 각 대사 영역에 포함된 텍스트를 추출할 수 있다. The computer system 100 can detect dialogue regions of the cut images 1310 and extract text included in each dialogue region for each dialogue region.

컴퓨터 시스템(100)은 추출된 텍스트 및 해당 텍스트에 대한 순서 정보를 포함하는 텍스트 정보(1320)를 생성할 수 있다. 텍스트 정보(1320)는 대체 텍스트(alternative text)로 명명될 수 있다.The computer system 100 may generate text information 1320 including the extracted text and order information about the text. Text information 1320 may be named alternative text.

텍스트 정보(1320)는 추출된 텍스트가 속하는 컷 정보(컷 번호), 추출된 텍스트가 속하는 대사 영역 정보(대사 영역 번호) 및 추출된 텍스트의 각 라인의 행 정보(행 번호)를 포함할 수 있다.The text information 1320 may include cut information (cut number) to which the extracted text belongs, dialogue region information (dialogue region number) to which the extracted text belongs, and line information (line number) of each line of the extracted text. .

도시된 것처럼, 텍스트 정보(1320)는 '>'를 선택함으로써 펼쳐보기가 가능하도록 구성될 수 있다. 텍스트 정보(1320)는 컷 내에서 또는 컷을 포함하는 이미지 내에서의 위치 정보(좌표, coordinate)를 더 포함할 수 있다. 이러한 위치 정보는 컷의 위치(즉, 컷 라인의 위치), 대사 영역(말풍선 등)의 위치, 및 대사 영역에 포함된 텍스트의 각 라인의 위치 중 적어도 하나를 포함할 수 있다. 이러한 위치 정보는 전술한 텍스트 정보에 포함되는 순서 정보를 결정하기 위한 정보로서 사용될 수 있다. 예컨대, 위치 정보가 나타내는 좌표에 따라, 컷의 순서나, 대사 영역의 순서가, 대사 영역에 포함되는 텍스트의 각 라인의 순서가 결정될 수 있다. textlist에는 대사 영역에 포함된 텍스트의 각 라인의 행 번호가 함께 표시될 수 있다. 행 번호의 상위에 대사 영역의 번호 및/또는 컷의 번호가 표시될 수 있다. 행 번호를 통해 대사 영역 내에서의 텍스트의 줄바꿈 여부가 식별될 수 있다.As shown, text information 1320 can be configured to be expanded by selecting '>'. The text information 1320 may further include location information (coordinates) within the cut or within the image including the cut. This positional information may include at least one of the position of the cut (i.e., the position of the cut line), the position of the dialogue area (speech balloon, etc.), and the position of each line of text included in the dialogue area. This location information can be used as information for determining order information included in the above-described text information. For example, depending on the coordinates indicated by the location information, the order of cuts, the order of the dialogue area, and the order of each line of text included in the dialogue area may be determined. The textlist can display the line number of each line of text included in the dialogue area. The dialogue area number and/or cut number may be displayed above the line number. Through the line number, it can be identified whether the text within the dialogue area has a line break.

말하자면, '컷 -> 대사 영역 -> 대사 영역에 포함된 텍스트(텍스트의 각 라인)'은 계층적인 관계를 갖도록 텍스트 정보(1320)에서 표시될 수 있다. 텍스트 정보(1320)는 콘텐츠의 소비자에 대한 전술한 오디오 정보 등의 제공을 위한 메타 정보로서 사용될 수 있다. In other words, 'cut -> dialogue area -> text (each line of text) included in the dialogue area' can be displayed in the text information 1320 to have a hierarchical relationship. Text information 1320 can be used as meta information to provide the above-described audio information, etc. to consumers of content.

이상 도 1 내지 도 12, 도 14 내지 도 16 및 도 19를 참조하여 전술된 기술적 특징에 대한 설명은, 도 13에 대해서도 그대로 적용될 수 있으므로 중복되는 설명은 생략한다.The description of the technical features described above with reference to FIGS. 1 to 12, 14 to 16, and 19 can also be applied to FIG. 13, so overlapping descriptions will be omitted.

도 17은 일 예에 따른, 웹툰인 콘텐츠의 이미지로부터 텍스트를 추출하고, 추출된 텍스트를 포함하는 텍스트 정보를 제공하는 방법을 나타낸다.Figure 17 shows a method of extracting text from an image of webtoon content and providing text information including the extracted text, according to an example.

도 17에서는, 콘텐츠가 웹툰 콘텐츠인 경우에 있어서, 소비자의 소비자 단말(160)에서 실행되는 웹툰 어플리케이션(1710)과, 콘텐츠 서버(150)에 해당하는 웹툰 서버(1720)와, 실시예의 컴퓨터 시스템(100)(또는 관리자 단말)에서 실행되는 텍스트 정보 관리 툴(1730)(도 17에서는 AI 대체 텍스트 관리 툴로 도시됨)의 예시적인 동작에 대해 더 자세하게 설명한다. '텍스트 정보'는 도 17에서는 대체 텍스트로 도시되었다. In Figure 17, in the case where the content is webtoon content, a webtoon application 1710 running on the consumer's consumer terminal 160, a webtoon server 1720 corresponding to the content server 150, and a computer system of the embodiment ( Example operations of the text information management tool 1730 (shown as an AI alternative text management tool in FIG. 17) running on 100) (or administrator terminal) will be described in more detail. 'Text information' is shown as alternative text in FIG. 17.

도시된 것처럼, 웹툰 어플리케이션(1710) 측에서는, 웹툰 어플리케이션(1710)은 소비자의 콘텐츠 열람의 요청에 따라 콘텐츠의 뷰어를 실행할 수 있다. 이 때, 웹툰 어플리케이션(1710)에서 전술한 '낭독기 기능(화면 낭독기)'이 사용되는 경우, 웹툰 어플리케이션(1710)은 콘텐츠 서버(150)에 콘텐츠와 연관된 텍스트 정보인 대체 텍스트를 요청할 수 있다. 이에 따라, 콘텐츠 서버(150)로부터 대체 텍스트에 대응하는 오디오 정보가 웹툰 어플리케이션(1710)으로 제공될 수 있고, 따라서, 대한 대체 텍스트의 낭독 서비스가 웹툰 어플리케이션(1710)에 제공될 수 있다. 콘텐츠가 컷이 구분되어 있는 컷툰인 경우, 웹툰 어플리케이션(1710)에서 열람되고 있는 컷의 대체 텍스트에 해당하는 오디오 정보가 웹툰 어플리케이션(1710)에 제공될 수 있다.As shown, on the webtoon application 1710 side, the webtoon application 1710 may run a content viewer according to a consumer's request to view content. At this time, when the above-mentioned 'reader function (screen reader)' is used in the webtoon application 1710, the webtoon application 1710 may request an alternative text, which is text information related to the content, from the content server 150. . Accordingly, audio information corresponding to the alternative text may be provided to the webtoon application 1710 from the content server 150, and therefore, a reading service for the alternative text may be provided to the webtoon application 1710. If the content is a cuttoon with separate cuts, audio information corresponding to the alternative text of the cut being viewed in the webtoon application 1710 may be provided to the webtoon application 1710.

웹툰 서버(1720)는 웹툰 어플리케이션(1710)에의 서비스 제공을 위해 대체 텍스트의 사본을 저장하고 있을 수 있다. 이러한 대체 텍스트의 사본은 텍스트 정보 관리 툴(1730)로부터 제공될 수 있다. The webtoon server 1720 may store a copy of the alternative text to provide services to the webtoon application 1710. A copy of this alternative text may be provided from text information management tool 1730.

웹툰 서버(1720)에는 이미지를 포함하는 콘텐츠가 업로드될 수 있고, 텍스트 정보 관리 툴(1730)은 이러한 콘텐츠를 텍스트 추출 대상인 콘텐츠로 식별하여, 콘텐츠의 이미지로부터 텍스트를 추출함으로써 대체 텍스트를 생성할 수 있다. Content including images may be uploaded to the webtoon server 1720, and the text information management tool 1730 may identify such content as content subject to text extraction and generate alternative text by extracting text from the image of the content. there is.

한편, 웹툰 서버(1720)에서는 콘텐츠의 수정(업데이트) 또는 삭제가 이루어질 수 있고, 텍스트 정보 관리 툴(1730)은 이러한 업데이트 또는 삭제를 인식하여, 해당 콘텐츠에 대한 대체 텍스트를 재생성하거나 삭제할 수 있다. 콘텐츠의 수정(업데이트) 또는 삭제는 웹툰 서버(1720)로부터 텍스트 정보 관리 툴(1730)로 즉시 전달되거나, 또는 주기적으로 전달될 수 있다. 한편, 전술한 것처럼, 대체 텍스트에 대해 잠금 설정이 된 경우에는 전술한 대체 텍스트의 재생성이나 삭제가 이루어지지 않을 수 있다.Meanwhile, content can be modified (updated) or deleted in the webtoon server 1720, and the text information management tool 1730 can recognize such update or deletion and create or delete alternative text for the content. Modification (update) or deletion of content may be transmitted immediately or periodically from the webtoon server 1720 to the text information management tool 1730. Meanwhile, as described above, if the alternative text is locked, the above-described alternative text may not be recreated or deleted.

텍스트 정보 관리 툴(1730)은 AI 모델링에 수정이 있는 경우(예컨대, 전술한 학습 모델의 업데이트가 있는 경우 등) 콘텐츠에 대한 대체 텍스트를 재생성할 수 있다. 이 경우에도 대체 텍스트에 대해 잠금 설정이 된 경우에는 전술한 대체 텍스트의 재생성이 이루어지지 않을 수 있다.The text information management tool 1730 may generate alternative text for content when there is a modification in the AI modeling (e.g., when there is an update to the above-described learning model, etc.). Even in this case, if the alternative text is locked, the above-described alternative text may not be regenerated.

텍스트 정보 관리 툴(1730)은 생성된 대체 텍스트를 검수하기 위한 기능으로서, 대체 텍스트를 편집(추가/수정/삭제)하는 기능을 제공할 수 있다. 이에 따라, AI 모델에 의한 대체 텍스트의 생성의 부정확성이 관리자에 의해 보정될 수 있다. The text information management tool 1730 is a function for inspecting the generated alternative text and may provide a function to edit (add/modify/delete) the alternative text. Accordingly, inaccuracies in the generation of alternative text by the AI model can be corrected by the administrator.

텍스트 정보 관리 툴(1730)을 포함하는 컴퓨터 시스템(100)에는 대체 텍스트의 원본이 저장될 수 있고, 이러한 대체 텍스트는 주기적으로(예컨대, 1시간 주기로) 웹툰 서버(1720)에 전달될 수 있다. 다만, 도 15를 참조하여 전술된 텍스트 정보를 더 빠르게(예컨대, 가장 높은 우선순위로) 서비스에 반영하기 위한 UI(1570)가 선택되는 경우 (검수된) 대체 텍스트는 웹툰 서버(1720)에 바로 전달될 수 있다.The computer system 100 including the text information management tool 1730 may store the original text of the alternative text, and this alternative text may be transmitted to the webtoon server 1720 periodically (eg, every hour). However, with reference to FIG. 15, when the UI 1570 is selected to reflect the above-described text information to the service more quickly (e.g., with the highest priority), the (verified) alternative text is immediately sent to the webtoon server 1720. It can be delivered.

이상 도 1 내지 도 16 및 도 19를 참조하여 전술된 기술적 특징에 대한 설명은, 도 17에 대해서도 그대로 적용될 수 있으므로 중복되는 설명은 생략한다.The description of the technical features described above with reference to FIGS. 1 to 16 and 19 can also be applied to FIG. 17 , so overlapping descriptions will be omitted.

도 18은 일 예에 따른, 대사 영역에 포함된 텍스트를 발화한 화자 또는 캐릭터를 결정하는 방법을 나타낸다. Figure 18 shows a method of determining a speaker or character who uttered text included in a dialogue area, according to an example.

컴퓨터 시스템(100)은 전술한 검출된 대사 영역별로 추출된 텍스트를 발화한 콘텐츠의 화자를 결정할 수 있다. 화자는 콘텐츠의 등장인물 또는 콘텐츠에 포함되는 캐릭터일 수 있다. The computer system 100 may determine the speaker of the content who uttered the extracted text for each detected dialogue region described above. The speaker may be a character in the content or a character included in the content.

컴퓨터 시스템(100)은 예컨대, 전술한 학습 모델을 사용하여 콘텐츠의 화자를 결정할 수 있다. 이러한 학습 모델은 다수의 웹툰 이미지들을 사용하여 미리 학습된 것으로서, 웹툰 이미지에 포함된 말풍선과 같은 대사 영역의 텍스트가 누구에 의해 발화된 것인지를 추정하도록 미리 학습된 것일 수 있다. Computer system 100 may determine the speaker of the content using, for example, the learning model described above. This learning model may be pre-trained using a number of webtoon images, and may be pre-trained to estimate who uttered the text in the dialogue area such as a speech bubble included in the webtoon image.

예컨대, 도시된 예시에서처럼, 컴퓨터 시스템(100)은 콘텐츠의 이미지(10)에서 검출된 대사 영역에 해당하는 말풍선과 연관하여 표현된 화자 이미지 및 검출된 대사 영역에 해당하는 말풍선의 색상 또는 모양 중 적어도 하나에 기반하여 화자를 결정할 수 있다.For example, as in the illustrated example, the computer system 100 may configure at least one of a speaker image expressed in association with a speech bubble corresponding to a dialogue area detected in the content image 10 and the color or shape of a speech bubble corresponding to the detected dialogue area. You can determine the speaker based on one.

일례로, 말풍선 또는 대사 영역의 주변에는 화자를 나타내는 이미지(1810-1, 1820)이 표시되어 있을 수 있고, 컴퓨터 시스템(100)은 이에 따라 대사 영역들(1810-2, 1820-2)의 각각이 누구로부터 발화된 것인지를 결정할 수 있다. 또는, 대사 영역에 해당하는 말풍선은 화자에 따라 다른 모양을 가질 수 있고, 컴퓨터 시스템(100)은 말풍선의 모양에 따라 대사 영역들(1820)의 각각이 누구로부터 발화된 것인지를 결정할 수 있다. 또는, 컴퓨터 시스템(100)은 말풍선의 꼬리 방향이 누구를 향하는 지에 따라 대사 영역들(1820)의 각각이 누구로부터 발화된 것인지를 결정할 수도 있다. 또는, 또는, 대사 영역에 해당하는 말풍선은 화자에 따라 다른 색상을 가질 수 있고, 컴퓨터 시스템(100)은 말풍선의 색상에 따라 대사 영역들(1830)의 각각이 누구로부터 발화된 것인지를 결정할 수 있다. For example, images 1810-1 and 1820 representing the speaker may be displayed around the speech bubble or dialogue area, and the computer system 100 may display each of the dialogue areas 1810-2 and 1820-2 accordingly. You can determine who this utterance came from. Alternatively, the speech bubble corresponding to the dialogue area may have a different shape depending on the speaker, and the computer system 100 can determine from whom each of the dialogue areas 1820 was spoken according to the shape of the speech bubble. Alternatively, the computer system 100 may determine who uttered each of the dialogue areas 1820 according to whom the tail of the speech bubble is directed toward. Alternatively, the speech bubble corresponding to the dialogue area may have a different color depending on the speaker, and the computer system 100 may determine from whom each of the dialogue areas 1830 was spoken according to the color of the speech balloon. .

실시예에서는, 검출된 대사 영역별로 추출된 텍스트에 기반하여 생성된 텍스트 정보(50)는 상기의 결정된 화자에 대한 정보를 더 포함할 수 있다. 따라서, 텍스트 정보(50)는 콘텐츠로부터 추출된 텍스트의 순서 정보 뿐만아니라 해당 텍스트가 누구로부터 발화된 것인지에 대한 정보를 더 포함할 수 있다. In an embodiment, the text information 50 generated based on the text extracted for each detected dialogue region may further include information about the determined speaker. Accordingly, the text information 50 may further include information about who uttered the text in addition to information about the order of the text extracted from the content.

이상 도 1 내지 도 17 및 도 19를 참조하여 전술된 기술적 특징에 대한 설명은, 도 18에 대해서도 그대로 적용될 수 있으므로 중복되는 설명은 생략한다.The description of the technical features described above with reference to FIGS. 1 to 17 and 19 can also be applied to FIG. 18, so overlapping descriptions will be omitted.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components. For example, the devices and components described in the embodiments include a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), and a programmable logic unit (PLU). It may be implemented using one or more general-purpose or special-purpose computers, such as a logic unit, microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include a plurality of processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에서 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. It can be embodied in . Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 이때, 매체는 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 임시 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 수개 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 매체의 예시로는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등을 포함하여 프로그램 명령어가 저장되도록 구성된 것이 있을 수 있다. 또한, 다른 매체의 예시로, 애플리케이션을 유통하는 앱 스토어나 기타 다양한 소프트웨어를 공급 내지 유통하는 사이트, 서버 등에서 관리하는 기록매체 내지 저장매체도 들 수 있다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. At this time, the medium may continuously store a computer-executable program, or temporarily store it for execution or download. In addition, the medium may be a variety of recording or storage means in the form of a single or several pieces of hardware combined. It is not limited to a medium directly connected to a computer system and may be distributed over a network. Examples of media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, And there may be something configured to store program instructions, including ROM, RAM, flash memory, etc. Additionally, examples of other media include recording or storage media managed by app stores that distribute applications, sites that supply or distribute various other software, or servers.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, various modifications and variations can be made by those skilled in the art from the above description. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims also fall within the scope of the claims described below.

Claims

A method of providing text information associated with content, performed by a computer system, comprising:
identifying content including images uploaded to a content server;
extracting text from the image included in the content; and
Providing text information including the extracted text as text information associated with the content.
A method of providing text information, including.

According to paragraph 1,
The image includes a plurality of cuts of the content in an order and text including dialogue of the content,
The extracted text is the dialogue extracted from the text included in the image,
The text information includes each line of a plurality of lines included in the dialogue, and order information of each line.

According to paragraph 2,
The step of extracting the text is,
detecting the plurality of cuts in the image;
generating each cut image including each cut of the plurality of cuts; and
Extracting text from cut images corresponding to the plurality of cuts
A method of providing text information, including.

According to paragraph 3,
The plurality of cuts are included in the image in an order of scrolling up and down,
A method of providing text information, wherein each cut image is configured to further include a blank area of a predetermined size above and below each cut.

According to paragraph 3,
The step of extracting text from the cut images is,
detecting a dialogue region containing the dialogue for each cut image;
Extracting text included in the dialogue region for each detected dialogue region using OCR (Optical Haracter Recognition); and
Generating the text information based on the text extracted for each detected dialogue region
Including,
The dialogue area is an area containing a speech bubble included in the image, a monologue or narration by a speaker or character of the content, or an area containing text explaining the content,
The text information includes, as the order information, information about which dialogue region and which cut the text extracted for each detected dialogue region was extracted from.

According to clause 5,
The order information further includes row information within the corresponding dialogue area of the text extracted for each detected dialogue area.

According to clause 6,
Each of the plurality of cuts is assigned a first order number, which is earlier as it is on the upper side in the vertical direction within the image, and is assigned a first order number that is earlier as it is on the left or right side at the same position in the vertical direction,
In each of the metabolic regions detected in each cut image, there is a second order number that is ahead the closer it is to the top in the vertical direction within each cut image, and is ahead the closer it is to the left or right at the same position in the up and down direction. assigned,
To each line of text extracted for each detected dialogue region, a third order number, which is higher as it moves upward in the vertical direction, is assigned as the line information,
The text information includes, as the order information, the first order number, the second order number, and the third order number for the text extracted from each of the dialogue areas.

According to clause 5,
The step of extracting text from the cut images is,
If the detected dialogue area is a first area containing the monologue or narration or a second area containing the explanatory text, generating a virtual speech bubble corresponding to the first area or the second area
It further includes,
The order information is based on the order in the image of speech balloons including the speech balloon corresponding to the detected dialogue area and the virtual speech balloon, to which of the speech balloons is the text extracted for each detected dialogue area. A method of providing text information, including information about whether it was extracted from.

According to clause 5,
The step of extracting text from the cut images is,
Generating one integrated metabolic region image by integrating metabolic regions detected from the cut images; and
For the dialogue regions included in the integrated dialogue region image, extracting text included in the dialogue region for each dialogue region using OCR (Optical Haracter Recognition)
A method of providing text information, including.

According to clause 5,
The step of detecting the metabolic region is,
detecting areas containing text in each cut image;
Among the areas, identifying a non-dialogue area, which is an area including text corresponding to the background of each cut, text representing sound effects of the content, and text determined to be unrelated to the story of the content; and
Detecting regions excluding the non-metabolic region among the regions as metabolic regions containing metabolism.
A method of providing text information, including.

According to paragraph 2,
The providing step includes providing the text information to the manager terminal according to a request from the manager terminal that manages the content,
Providing a function to enable inspection of the text information for the manager terminal
It further includes,
The function that enables the above inspection is,
Providing text information, including at least one of a first function for enabling editing of the text information, a second function for enabling downloading of the text information, and a third function for setting whether the text information can be updated. method.

According to clause 11,
The function enabling the inspection includes the first function,
The step of providing the function that enables the inspection is,
Displaying, at the administrator terminal, the text information including a first cut selected by the administrator among the plurality of cuts and dialogue extracted from the selected first cut;
providing a first user interface for editing the displayed text information; and
Providing a second user interface that allows switching from the first cut to a second cut that is another cut among the plurality of cuts.
A method of providing text information, including.

According to paragraph 2,
The providing step includes providing audio information corresponding to the text information to the consumer terminal according to a request from the consumer terminal consuming the content.

According to clause 13,
The steps provided above are:
When viewing the content is requested from the consumer terminal, calling the text information associated with the content;
Recognizing a cut being viewed by the consumer terminal among the plurality of cuts;
Outputting audio information corresponding to a portion of the text information corresponding to the recognized cut from the consumer terminal.
A method of providing text information, including.

According to paragraph 1,
Monitoring whether the content is updated or deleted on the content server;
When an update to the content is identified, extracting text from the image included in the updated content; and
If deletion of the content is identified, deleting the text information associated with the content.
A method of providing text information, further comprising:

According to clause 5,
Determining a speaker of the content who uttered the text extracted for each detected dialogue area - the speaker is displayed in the speaker image expressed in association with a speech bubble corresponding to the detected dialogue area in the image and the detected dialogue area. Determined based on at least one of the color or shape of the corresponding speech bubble -
It further includes,
The text information generated based on the text extracted for each detected dialogue region further includes information about the determined speaker.

A program recorded on a computer-readable recording medium for executing the method of claim 1 on the computer system.

In a computer system that provides text information associated with content,
At least one processor implemented to execute instructions readable by the computer system
Including,
The at least one processor,
Identify content, including images uploaded to a content server;
A computer system for extracting text from the image included in the content and providing text information including the extracted text as text information associated with the content.

According to clause 18,
The image includes a plurality of cuts of the content in an order and text including dialogue of the content,
The extracted text is the dialogue extracted from the text included in the image,
The text information includes each line of the plurality of lines included in the dialogue, and order information of each line,
The at least one processor,
Detecting the plurality of cuts in the image, generating each cut image including each cut of the plurality of cuts, and extracting text from the cut images corresponding to the plurality of cuts,
For each cut image, a dialogue area containing the dialogue is detected, text included in the dialogue area for each detected dialogue area is extracted using OCR, and based on the text extracted for each detected dialogue area. to generate the text information,
The dialogue area is an area containing a speech bubble included in the image, a monologue or narration by a character of the content, or an area containing text explaining the content,
The computer system wherein the text information includes information about which dialogue region and which cut the text extracted for each detected dialogue region was extracted as the order information.