KR102602936B1

KR102602936B1 - Electronic device for generating short form based on collected data through artificial intelligence and method using the same

Info

Publication number: KR102602936B1
Application number: KR1020230091581A
Authority: KR
Inventors: 이재호; 박기웅
Original assignee: (주)엔아이지씨; 이재호; 박기웅
Priority date: 2023-07-14
Filing date: 2023-07-14
Publication date: 2023-11-16
Also published as: KR102677612B1

Abstract

본 개시의 실시예에 따른 인공 지능을 통해 수집한 데이터를 기반으로 숏폼을 자동으로 생성하는 전자 장치는 카테고리 데이터베이스, 콘텐츠 수집 모듈, 데이터 분류 모듈, 및 숏폼 제공 모듈의 동작을 제어하는 프로세서를 포함하고, 상기 프로세서는 상기 콘텐츠 수집 모듈을 통해 적어도 하나의 마크업 언어 본문으로부터 제1 데이터 및 제2 데이터를 수집하고, 상기 데이터 분류 모듈을 통해 상기 제1 데이터 및 상기 제2 데이터에 대응하는 카테고리 및 사운드를 맵핑하고, 상기 숏폼 제공 모듈을 통해 상기 적어도 하나의 마크업 언어 본문과 관련된 적어도 하나의 숏폼을 제공하도록 설정될 수 있다.An electronic device that automatically generates a short form based on data collected through artificial intelligence according to an embodiment of the present disclosure includes a processor that controls the operations of a category database, a content collection module, a data classification module, and a short form provision module; , the processor collects first data and second data from at least one markup language body through the content collection module, and categories and sounds corresponding to the first data and the second data through the data classification module. It can be set to map and provide at least one short form related to the at least one markup language body through the short form providing module.

Description

Electronic device that automatically generates short forms based on data collected through artificial intelligence and method using the same {ELECTRONIC DEVICE FOR GENERATING SHORT FORM BASED ON COLLECTED DATA THROUGH ARTIFICIAL INTELLIGENCE AND METHOD USING THE SAME}

본 개시는 인공 지능을 통해 수집한 데이터를 기반으로 숏폼을 자동으로 생성하는 전자 장치 및 이를 이용한 방법에 관한 것이다. 보다 자세하게, 본 개시는 인공 지능 학습 모델을 통해 마크업 언어 본문 소스로부터 추출한 데이터를 이용하여 마크업 언어 본문과 관련된 숏폼을 자동적을오 생성하는 장치 및 이를 이용한 방법에 관한 것이다.This disclosure relates to an electronic device that automatically generates a short form based on data collected through artificial intelligence and a method of using the same. More specifically, the present disclosure relates to a device for automatically generating a short form related to a markup language body using data extracted from a markup language body source through an artificial intelligence learning model and a method of using the same.

영상 데이터 발전에 따라, 짧은 영상 재생 시간을 갖는 숏폼(short-form) 형식의 영상 데이터가 사람들의 이목을 끌고 있다. 숏폼은 짧은 재생 시간에도 불구하고 사용자가 관심이 있는 분야라면 사용자의 시선을 붙잡아 두기에 충분하다. 이는 숏폼 영상에 대한 최근의 수요가 그 근거가 될 수 있다.As video data develops, short-form video data with a short video playback time is attracting people's attention. Despite the short playback time, the short form is enough to hold the user's attention if it is an area of interest. This could be based on the recent demand for short-form videos.

단말기를 소지하고 있는 사용자들은 단말기를 이용하여 온라인 쇼핑 웹 사이트나 뉴스가 게시되는 뉴스 웹 사이트 등에 접근할 수 있다. 이 경우, 각각의 웹 사이트에 게시되어 있는 정보들을 취합하여 실시간으로 숏폼을 생성하여 사용자에게 제공한다면 보다 효과적으로 사용자에게 다가가 판매를 높이거나 클릭률을 높일 수 있다. 이와 같이, 실시간으로 사용자가 접근하게 되는 웹 페이지 등에 포함된 데이터를 추출하여 숏폼을 생성하여 해당 웹 페이지의 목적을 달성하는 필요성이 대두된다.Users who have a terminal can use the terminal to access online shopping websites or news websites where news is posted. In this case, if you collect the information posted on each website and create a short form in real time and provide it to the user, you can more effectively reach the user and increase sales or increase the click-through rate. In this way, there is a need to achieve the purpose of the web page by extracting data contained in web pages accessed by users in real time and creating a short form.

KRKR 10-2022-0127714 10-2022-0127714 AA

본 개시의 실시예는 상술한 문제점을 해결하기 위하여 제안된 것으로, 마크업 언어 본문 소스로부터 각종 데이터를 통해 속성 정보를 추출한 후 사용자의 관심도가 높을 것으로 예상되는 콘텐츠를 숏폼으로 생성하여 자동적으로 제공할 수 있다.The embodiment of the present disclosure is proposed to solve the above-described problem. After extracting attribute information through various data from the markup language body source, content expected to be of high interest to the user is generated in a short form and automatically provided. You can.

본 개시가 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present disclosure are not limited to the problems mentioned above, and other problems not mentioned can be clearly understood by those skilled in the art from the description below.

본 개시의 실시예에 따른 상기 프로세서는 상기 데이터 분류 모듈을 통해 상기 제1 데이터를 구성하는 텍스트와 상기 제2 데이터를 구성하는 텍스트의 동일 텍스트 개수를 확인하고, 확인한 동일 텍스트 개수가 기설정된 개수 이상인 경우, 상기 데이터 분류 모듈을 통해 상기 제1 데이터와 상기 제2 데이터를 유사 콘텐츠로 분류하고, 확인한 동일 텍스트 개수가 기설정된 개수 미만인 경우, 상기 데이터 분류 모듈을 통해 상기 제1 데이터와 상기 제2 데이터를 비유사 콘텐츠로 분류하도록 설정될 수 있다.The processor according to an embodiment of the present disclosure confirms the number of identical texts of the text constituting the first data and the text constituting the second data through the data classification module, and the confirmed number of identical texts is greater than or equal to a preset number. In this case, the first data and the second data are classified as similar content through the data classification module, and if the number of confirmed identical texts is less than a preset number, the first data and the second data are classified through the data classification module. can be set to classify as dissimilar content.

본 개시의 실시예에 따른 인공 지능을 통해 수집한 데이터를 기반으로 숏폼을 자동으로 생성하는 전자 장치의 프로세서는 상기 카테고리 데이터베이스로부터 제1 카테고리 및 제2 카테고리를 확인하고, 상기 제1 데이터와 상기 제2 데이터가 유사 콘텐츠인 경우, 상기 데이터 분류 모듈을 통해 상기 제1 데이터와 상기 제2 데이터를 상기 제1 카테고리에 맵핑하고, 상기 제1 데이터와 상기 제2 데이터가 비유사 콘텐츠인 경우, 상기 데이터 분류 모듈을 통해 상기 제1 데이터를 상기 제1 카테고리에 맵핑하면서 상기 제2 데이터를 상기 제2 카테고리에 맵핑하도록 설정되고, 상기 제1 카테고리 및 상기 제2 카테고리는 상이한 카테고리이면서 개별 카테고리는 동일한 클래스로 하위 카테고리를 포함하는 것을 특징으로 할 수 있다.A processor of an electronic device that automatically generates a short form based on data collected through artificial intelligence according to an embodiment of the present disclosure checks the first category and the second category from the category database, and stores the first data and the second category. 2 If the data is similar content, the first data and the second data are mapped to the first category through the data classification module, and if the first data and the second data are dissimilar content, the data It is set to map the second data to the second category while mapping the first data to the first category through a classification module, wherein the first category and the second category are different categories and individual categories are classified into the same class. It may be characterized as including subcategories.

본 개시의 실시예에 따른 인공 지능을 통해 수집한 데이터를 기반으로 숏폼을 자동으로 생성하는 전자 장치의 프로세서는 상기 데이터 분류 모듈을 통해 상기 제1 데이터가 맵핑되는 카테고리에 따라 제1 사운드를 상기 제1 데이터에 맵핑하고, 상기 데이터 분류 모듈을 통해 상기 제2 데이터가 맵핑되는 카테고리에 따라 제2 사운드를 상기 제2 데이터에 맵핑하도록 설정될 수 있다.A processor of an electronic device that automatically generates a short form based on data collected through artificial intelligence according to an embodiment of the present disclosure generates the first sound according to the category to which the first data is mapped through the data classification module. 1 data, and can be set to map the second sound to the second data according to the category to which the second data is mapped through the data classification module.

본 개시의 실시예에 따른 인공 지능을 통해 수집한 데이터를 기반으로 숏폼을 자동으로 생성하는 전자 장치의 프로세서는 상기 콘텐츠 수집 모듈을 통해 상기 제1 데이터 및 상기 제2 데이터 각각에 대응하는 랭크 정보 및 상기 적어도 하나의 마크업 언어 본문에 대응하는 이벤트 정보를 확인하고, 상기 숏폼 제공 모듈을 통해 상기 제1 데이터에 대응하는 랭크 정보 및 상기 제1 데이터를 수집한 제1 마크업 언어 본문 또는 제2 마크업 언어 본문에 대응하는 이벤트 정보를 반영하여, 상기 제1 데이터를 기반으로 한 제1 숏폼을 생성하고, 상기 숏폼 제공 모듈을 통해 상기 제2 데이터에 대응하는 랭크 정보 및 상기 제2 데이터를 수집한 제1 마크업 언어 본문 또는 제2 마크업 언어 본문에 대응하는 이벤트 정보를 반영하여, 상기 제2 데이터를 기반으로 한 제2 숏폼을 생성하고, 상기 숏폼 제공 모듈을 통해 상기 제1 숏폼 및 상기 제2 숏폼을 제공하도록 설정될 수 있다.The processor of an electronic device that automatically generates a short form based on data collected through artificial intelligence according to an embodiment of the present disclosure collects rank information corresponding to each of the first data and the second data through the content collection module, and A first markup language body or a second mark that confirms event information corresponding to the at least one markup language body and collects rank information and the first data corresponding to the first data through the short form providing module. By reflecting the event information corresponding to the up language body, a first short form based on the first data is generated, and the rank information corresponding to the second data and the second data are collected through the short form providing module. By reflecting the event information corresponding to the first markup language body or the second markup language body, a second short form based on the second data is generated, and the first short form and the first short form are provided through the short form providing module. 2 Can be set up to provide short form.

본 개시의 실시예에 따른 인공 지능을 통해 수집한 데이터를 기반으로 숏폼을 자동으로 생성하는 전자 장치의 프로세서는 상기 제1 데이터 및 상기 제2 데이터를 구성하는 텍스트를 기반으로 키워드를 확인하고, 기설정된 제1 정규식을 이용하여 상기 제1 데이터 및 상기 제2 데이터에 대한 제1 키워드를 추출하고, 상기 제1 키워드 추출 이후, 기설정된 제2 정규식을 이용하여 제2 키워드를 추출하도록 설정될 수 있다.The processor of an electronic device that automatically generates a short form based on data collected through artificial intelligence according to an embodiment of the present disclosure verifies keywords based on the text constituting the first data and the second data, and The first keyword for the first data and the second data can be extracted using a set first regular expression, and after extracting the first keyword, the second keyword can be extracted using a preset second regular expression. .

본 개시의 실시예에 따른 인공 지능을 통해 수집한 데이터를 기반으로 숏폼을 자동으로 생성하는 전자 장치는 딕셔너리 데이터베이스를 더 포함하고, 상기 프로세서는 기설정된 제3 정규식을 이용하여 추출한 제3 키워드 중 추출 빈도가 기설정된 빈도 이상인 키워드를 별도의 키워드 데이터로 제공하고, 상기 별도의 키워드 데이터를 상기 딕셔너리 데이터베이스에 저장하도록 설정될 수 있다.An electronic device that automatically generates a short form based on data collected through artificial intelligence according to an embodiment of the present disclosure further includes a dictionary database, and the processor extracts a third keyword extracted using a preset third regular expression. Keywords whose frequency is greater than or equal to a preset frequency may be provided as separate keyword data, and the separate keyword data may be set to be stored in the dictionary database.

본 개시의 실시예에 따른 인공 지능을 통해 수집한 데이터를 기반으로 숏폼을 자동으로 생성하는 전자 장치의 프로세서는 상기 별도의 키워드 데이터가 상기 딕셔너리 데이터베이스에 저장되는 경우, 상기 별도의 키워드를 추출하는 정규식을 제4 정규식으로 결정하도록 설정될 수 있다.The processor of an electronic device that automatically generates a short form based on data collected through artificial intelligence according to an embodiment of the present disclosure uses a regular expression to extract the separate keyword when the separate keyword data is stored in the dictionary database. can be set to determine with the fourth regular expression.

본 개시의 실시예에 따른 인공 지능을 통해 수집한 데이터를 기반으로 숏폼을 자동으로 생성하는 전자 장치의 프로세서는 상기 제1 마크업 언어 본문 또는 제2 마크업 언어 본문 외의 다른 마크업 언어 본문으로부터 제3 데이터를 수집하는 경우, 상기 제1 데이터 및 상기 제2 데이터를 통한 상기 제1 숏폼 및 상기 제2 숏폼 생성 과정에서 상기 카테고리 데이터베이스 및 상기 딕셔너리 데이터베이스에 누적적으로 저장되는 정보를 기반으로 상기 다른 마크업 언어 본문으로부터 수집되는 상기 제3 데이터에 대응하는 제3 숏폼을 제공하도록 설정될 수 있다.The processor of an electronic device that automatically generates a short form based on data collected through artificial intelligence according to an embodiment of the present disclosure generates a short form from a markup language body other than the first markup language body or the second markup language body. 3 When collecting data, the other mark is created based on information stored cumulatively in the category database and the dictionary database in the process of creating the first short form and the second short form through the first data and the second data. It may be set to provide a third short form corresponding to the third data collected from the up language text.

본 개시의 실시예에 따른 인공 지능을 통해 수집한 데이터를 기반으로 숏폼을 자동으로 생성하는 방법은 콘텐츠 수집 모듈을 통해 적어도 하나의 마크업 언어 본문으로부터 제1 데이터 및 제2 데이터를 수집하는 단계, 데이터 분류 모듈을 통해 상기 제1 데이터 및 상기 제2 데이터에 대응하는 카테고리 및 사운드를 맵핑하는 단계, 및 숏폼 제공 모듈을 통해 상기 적어도 하나의 마크업 언어 본문과 관련된 적어도 하나의 숏폼을 제공하는 단계를 포함할 수 있다.A method of automatically generating a short form based on data collected through artificial intelligence according to an embodiment of the present disclosure includes collecting first data and second data from at least one markup language body through a content collection module; Mapping categories and sounds corresponding to the first data and the second data through a data classification module, and providing at least one short form related to the at least one markup language body through a short form providing module. It can be included.

본 개시의 전술한 과제 해결 수단에 의하면, 숏폼 자동 생성 전자 장치는 크롤링 엔진과 같은 검색 엔진이나 머신 러닝 프로세서를 통해 사용자가 단말기를 통해 접근한 마크업 언어 본문 소스로부터 해당 마크업 언어 본분과 관련된 데이터로 숏폼 영상을 자동적으로 생성하여 사용자가 관심을 두고 있는 내용에 대해 즉각적인 영상 정보를 제공할 수 있다.According to the means for solving the above-described problem of the present disclosure, an electronic device that automatically generates a short form collects data related to the markup language body from a markup language body source accessed by a user through a terminal through a search engine such as a crawling engine or a machine learning processor. By automatically creating short-form videos, you can provide immediate video information about the content the user is interested in.

또한, 본 개시의 다양한 실시예에 따르면 숏폼 영상이 자동적으로 생성되므로 수동으로 콘텐츠를 담은 카탈로그 등을 제작할 필요성이 없다. 본 개시의 다양한 실시얘에 따르면, 숏폼 자동 생성 전자 장치는 인공 지능을 활용하여 사용자가 방문한 적은 수의 웹 페이지만으로도 해당 웹 페이지와 관련된 많은 수의 콘텐츠를 추출하여 유사 콘텐츠 분석을 통해 숏폼을 자동으로 생성할 수 있다.Additionally, according to various embodiments of the present disclosure, short-form videos are automatically generated, so there is no need to manually create catalogs containing content. According to various embodiments of the present disclosure, an electronic device for automatically generating short forms utilizes artificial intelligence to extract a large number of contents related to the web pages with only a small number of web pages visited by the user and automatically generates short forms through similar content analysis. can be created.

본 개시의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned may be clearly understood by those skilled in the art from the description below.

도 1은 본 개시의 다양한 실시예에 따른 인공 지능을 통해 수집한 데이터를 기반으로 숏폼을 자동으로 생성하는 시스템에 관한 개략적인 블록도이다.
도 2는 본 개시의 다양한 실시예에 따른 숏폼 자동 생성 전자 장치의 구성요소들에 관한 개략적인 블록도이다.
도 3은 본 개시의 다양한 실시예에 따른 숏폼을 자동으로 생성하는 방법에 관한 개략적인 흐름도이다.
도 4는 본 개시의 다양한 실시예에 따른 숏폼 자동 생성 프로세스 모델의 생성 과정에 관한 흐름도이다.
도 5는 본 개시의 다양한 실시예에 따른 마크업 언어 본문의 부가 정보를 반영하는 숏폼 생성 프로세스에 관한 흐름도이다.
도 6은 본 개시의 다양한 실시예에 따른 숏폼 자동 생성 프로세스 모델을 통한 숏폼 생성에 관한 상세한 흐름도이다.1 is a schematic block diagram of a system that automatically generates a short form based on data collected through artificial intelligence according to various embodiments of the present disclosure.
FIG. 2 is a schematic block diagram of components of a short-form automatically generated electronic device according to various embodiments of the present disclosure.
Figure 3 is a schematic flowchart of a method for automatically generating a short form according to various embodiments of the present disclosure.
Figure 4 is a flowchart of the creation process of a short-form automatic generation process model according to various embodiments of the present disclosure.
Figure 5 is a flowchart of a short form creation process reflecting additional information of the markup language body according to various embodiments of the present disclosure.
Figure 6 is a detailed flowchart of short form creation through a short form automatic creation process model according to various embodiments of the present disclosure.

본 개시 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다. 본 개시가 실시예들의 모든 요소들을 설명하는 것은 아니며, 본 개시가 속하는 기술분야에서 일반적인 내용 또는 실시예들 간에 중복되는 내용은 생략한다. 명세서에서 사용되는 '부, 모듈, 부재, 블록'이라는 용어는 소프트웨어 또는 하드웨어로 구현될 수 있으며, 실시예들에 따라 복수의 '부, 모듈, 부재, 블록'이 하나의 구성요소로 구현되거나, 하나의 '부, 모듈, 부재, 블록'이 복수의 구성요소들을 포함하는 것도 가능하다. Like reference numerals refer to like elements throughout this disclosure. The present disclosure does not describe all elements of the embodiments, and general content or overlapping content between the embodiments in the technical field to which the present disclosure pertains is omitted. The term 'unit, module, member, block' used in the specification may be implemented as software or hardware, and depending on the embodiment, a plurality of 'unit, module, member, block' may be implemented as a single component, or It is also possible for one 'part, module, member, or block' to include multiple components.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐 아니라, 간접적으로 연결되어 있는 경우를 포함하고, 간접적인 연결은 무선 통신망을 통해 연결되는 것을 포함한다.Throughout the specification, when a part is said to be “connected” to another part, this includes not only direct connection but also indirect connection, and indirect connection includes connection through a wireless communication network. do.

또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Additionally, when a part "includes" a certain component, this means that it may further include other components rather than excluding other components, unless specifically stated to the contrary.

명세서 전체에서, 어떤 부재가 다른 부재 "상에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout the specification, when a member is said to be located “on” another member, this includes not only cases where a member is in contact with another member, but also cases where another member exists between the two members.

제1, 제2 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위해 사용되는 것으로, 구성요소가 전술된 용어들에 의해 제한되는 것은 아니다. Terms such as first and second are used to distinguish one component from another component, and the components are not limited by the above-mentioned terms.

단수의 표현은 문맥상 명백하게 예외가 있지 않는 한, 복수의 표현을 포함한다.Singular expressions include plural expressions unless the context clearly makes an exception.

각 단계들에 있어 식별부호는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 실시될 수 있다. The identification code for each step is used for convenience of explanation. The identification code does not explain the order of each step, and each step may be performed differently from the specified order unless a specific order is clearly stated in the context. there is.

이하 첨부된 도면들을 참고하여 본 개시의 작용 원리 및 실시예들에 대해 설명한다.Hereinafter, the operating principle and embodiments of the present disclosure will be described with reference to the attached drawings.

본 명세서에서 '본 개시에 따른 장치'는 연산처리를 수행하여 사용자에게 결과를 제공할 수 있는 다양한 장치들이 모두 포함된다. 예를 들어, 본 개시에 따른 장치는, 컴퓨터, 서버 장치 및 휴대용 단말기를 모두 포함하거나, 또는 어느 하나의 형태가 될 수 있다.In this specification, 'device according to the present disclosure' includes all various devices that can perform computational processing and provide results to the user. For example, the device according to the present disclosure may include all of a computer, a server device, and a portable terminal, or may take the form of any one.

여기에서, 상기 컴퓨터는 예를 들어, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop), 태블릿 PC, 슬레이트 PC 등을 포함할 수 있다.Here, the computer may include, for example, a laptop, desktop, laptop, tablet PC, slate PC, etc. equipped with a web browser.

상기 서버 장치는 외부 장치와 통신을 수행하여 정보를 처리하는 서버로써, 애플리케이션 서버, 컴퓨팅 서버, 데이터베이스 서버, 파일 서버, 게임 서버, 메일 서버, 프록시 서버 및 웹 서버 등을 포함할 수 있다.The server device is a server that processes information by communicating with external devices, and may include an application server, computing server, database server, file server, game server, mail server, proxy server, and web server.

상기 휴대용 단말기는 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), WiBro(Wireless Broadband Internet) 단말, 스마트 폰(Smart Phone) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치와 시계, 반지, 팔찌, 발찌, 목걸이, 안경, 콘택트 렌즈, 또는 머리 착용형 장치(head-mounted-device(HMD) 등과 같은 웨어러블 장치를 포함할 수 있다.The portable terminal is, for example, a wireless communication device that guarantees portability and mobility, such as PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular), PHS (Personal Handyphone System), and PDA. (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), WiBro (Wireless Broadband Internet) terminal, smart phone ), all types of handheld wireless communication devices, and wearable devices such as watches, rings, bracelets, anklets, necklaces, glasses, contact lenses, or head-mounted-device (HMD). may include.

도 1은 본 개시의 다양한 실시예에 따른 인공 지능을 통해 수집한 데이터를 기반으로 숏폼을 자동으로 생성하는 시스템에 관한 개략적인 블록도이다.1 is a schematic block diagram of a system that automatically generates a short form based on data collected through artificial intelligence according to various embodiments of the present disclosure.

도 1을 참고하면, 인공 지능을 통해 수집한 데이터를 기반으로 숏폼을 자동으로 생성하는 시스템(이하 '숏폼 자동 생성 시스템(10)')은 전자 장치(100) 및 단말기(200)를 포함한다. 각각의 노드는 서로 다른 노드와 데이터를 주고받을 수 있다. 각 노드들은 네트워크를 통해 연결될 수 있다.Referring to Figure 1, a system that automatically generates a short form based on data collected through artificial intelligence (hereinafter referred to as the 'automatic short form generation system 10') includes an electronic device 100 and a terminal 200. Each node can exchange data with other nodes. Each node can be connected through a network.

본 개시의 다양한 실시예에 따른 전자 장치(100)는 숏폼을 자동으로 생성하는 장치이다. 숏폼은 짧은 영상 형식으로 만들어진 콘텐츠를 의미한다. 전자 장치(100)는 사용자가 웹 서핑 또는 크롤링하는 특정 사이트 또는 마크업 본문을 구성하는 콘텐츠로부터 수집한 데이터를 기반으로 하여 사용자에게 숏폼 콘텐츠를 자동으로 생성하여 제공할 수 있다. 전자 장치(100)는 유선 및/또는 무선 연결을 통해 다른 장치(예: 단말기(200))로 데이터를 전송하거나 수신할 수 있다.The electronic device 100 according to various embodiments of the present disclosure is a device that automatically generates short forms. Short form refers to content created in a short video format. The electronic device 100 may automatically generate and provide short-form content to the user based on data collected from a specific site that the user surfs or crawls on the web or from content constituting the markup body. The electronic device 100 may transmit or receive data to another device (eg, the terminal 200) through a wired and/or wireless connection.

본 개시의 다양한 실시예에 따른 단말기(200)는 전자 장치(100)로부터 생성된 숏폼을 수신하는 장치일 수 있다. 또한, 단말기(200)는 전자 장치(100)가 인공 지능을 이용하여 학습 데이터를 통해 숏폼 생성 모델을 만드는데 필요한 학습 데이터를 전자 장치(100)에 제공할 수 있다. 단말기(200)는 사용자가 소지하고 있는 휴대용 전화기, 랩탑 컴퓨터, 퍼스널 컴퓨터, 태블릿 등의 장치를 포함할 수 있다. 즉, 단말기(200)는 사용자가 소지하여 전자 장치(100)와 네트워크로 연결되어 상호 간 데이터를 주고받을 수 있는 장치면 어느 것이든 가능할 수 있다.The terminal 200 according to various embodiments of the present disclosure may be a device that receives a short form generated from the electronic device 100. Additionally, the terminal 200 may provide the electronic device 100 with learning data necessary for the electronic device 100 to create a short-form generation model through learning data using artificial intelligence. The terminal 200 may include devices owned by the user, such as a portable phone, laptop computer, personal computer, or tablet. In other words, the terminal 200 can be any device that the user carries and is connected to the electronic device 100 through a network to exchange data with the electronic device 100.

도 2는 본 개시의 다양한 실시예에 따른 숏폼 자동 생성 전자 장치의 구성요소들에 관한 개략적인 블록도이다.FIG. 2 is a schematic block diagram of components of a short-form automatically generated electronic device according to various embodiments of the present disclosure.

도 2를 참고하면, 전자 장치(100)는 내부 구성요소로 프로세서(110), 카테고리 데이터베이스(120), 및 딕셔너리 데이터베이스(130)를 포함할 수 있으며, 이에 한정되지 않는다. 각각의 노드는 서로 다른 노드와 데이터를 주고받을 수 있다. 각 노드들은 직접적으로 전기적으로 연결되거나 네트워크를 통해 유선 및/또는 무선으로 연결될 수 있다. 본 개시의 전자 장치(100)는 프로세서(110) 대신 별도의 서버를 통해 프로세서(110)의 기능을 수행할 수 있다.Referring to FIG. 2, the electronic device 100 may include a processor 110, a category database 120, and a dictionary database 130 as internal components, but is not limited thereto. Each node can exchange data with other nodes. Each node may be connected directly electrically or wired and/or wirelessly through a network. The electronic device 100 of the present disclosure may perform the functions of the processor 110 through a separate server instead of the processor 110.

본 개시의 다양한 실시예에 따른 전자 장치(100)는 사용자가 단말기(200)를 통해 수집하거나 단말기(200)를 통해 수행하는 입력을 통해 획득되는 데이터를 확인하는 장치이다. 예를 들어, 전자 장치(100)는 사용자가 단말기(200)를 통해 인터넷 웹 페이지에 접속하여 해당 웹 페이지 내에서 서치하게 되는 인터넷 쿠키 데이터, 인터넷 로그 데이터, 웹 페이지 데이터 등을 확인할 수 있다. 즉, 전자 장치(100)는 사용자가 단말기(200)를 이용하여 접하게 되는 콘텐츠들을 실시간으로 확인할 수 있으며, 그 콘텐츠들과 관련된 데이터들을 가공하여 사용자가 관심이 있을만한 숏폼을 생성할 수 있다.The electronic device 100 according to various embodiments of the present disclosure is a device that verifies data collected by a user through the terminal 200 or obtained through input performed through the terminal 200. For example, the electronic device 100 may check Internet cookie data, Internet log data, web page data, etc. that are searched within the web page when the user accesses the Internet web page through the terminal 200. That is, the electronic device 100 can check in real time the content that the user encounters using the terminal 200, and process data related to the content to create a short form that the user may be interested in.

도 2를 참고하면, 프로세서(110)는 콘텐츠 수집 모듈(111), 데이터 분류 모듈(112), 및 숏폼 제공 모듈(113)를 포함하며, 이에 제한되지 않는다.Referring to FIG. 2, the processor 110 includes, but is not limited to, a content collection module 111, a data classification module 112, and a short form provision module 113.

실시예에 따른 콘텐츠 수집 모듈(111)은 적어도 하나의 마크업 언어 본문으로부터 데이터를 수집할 수 있다. 예를 들어, 마크업 언어 본문은 HTML 본문을 의미할 수 있다. 콘텐츠 수집 모듈(111)은 HTML 본문에 있는 모든 데이터를 텍스트화하여 수집할 수 있으며, 이미지나 영상 등도 수집할 수 있다.The content collection module 111 according to an embodiment may collect data from at least one markup language body. For example, markup language body may refer to HTML body. The content collection module 111 can collect all data in the HTML text by converting it into text, and can also collect images and videos.

실시예에 따른 데이터 분류 모듈(112)은 마크업 언어 본문의 데이터에 대응하는 카테고리와 사운드를 맵핑할 수 있다. 예를 들어, 데이터 분류 모듈(112)은 데이터들 간 상호 유사도를 판단하고, 유사도를 기준으로 하여 카테고리를 분류할 수 있다. 데이터 분류 모듈(112)은 데이터들의 카테고리 분류에 대응하여 각각의 카테고리에 맞는 사운드를 각각의 데이터에 맵핑할 수 있다. 이에 따라, 프로세서(110)는 데이터의 개수, 카테고리 등에 맞추어 설정된 시간(예: 1분)의 숏폼을 생성할 수 있다.The data classification module 112 according to the embodiment may map categories and sounds corresponding to data in the markup language body. For example, the data classification module 112 may determine mutual similarity between data and classify categories based on the similarity. The data classification module 112 may map sounds suitable for each category to each data in response to the category classification of the data. Accordingly, the processor 110 may generate a short form for a time (eg, 1 minute) set according to the number and category of data.

실시예에 따른 숏폼 제공 모듈(113)은 마크업 언어 본문과 관련된 적어도 하나의 숏폼을 단말기(200)로 제공할 수 있다. 예를 들어, 숏폼 제공 모듈(113)은 데이터 분류 모듈(112)을 동작시키기 위한 숏폼 생성 모델을 기반으로 하여 학습된 모델을 기초로 마크업 언어 본문의 데이터에 대응하는 숏폼을 생성하여 단말기(200)로 제공할 수 있다. 즉, 숏폼 제공 모듈(113)은 단말기(200)를 소지하고 있는 사용자가 관심이 있을만한 콘텐츠를 숏폼으로 생성하여 제공할 수 있다. 특히, 숏폼 제공 모듈(113)은 사용자가 단말기(200)를 통해 실시간으로 보고 있는 웹 페이지에 제공되고 있는 콘텐츠와 관련된 숏폼을 실시간으로 생성하여 실시간으로 보고 있는 웹 페이지 상의 적어도 일부에 제공할 수 있다. 다른 예를 들어, 숏폼 제공 모듈(113)은 사용자가 단말기(200)를 통해 실시간으로 보고 있는 웹 페이지에 제공되고 있는 콘텐츠와 관련된 숏폼을 실시간으로 생성하여 실시간으로 보고 있는 웹 페이지 상의 적어도 일부에 팝업(pop-up) 형식으로 제공할 수 있다.The short form providing module 113 according to the embodiment may provide at least one short form related to the markup language body to the terminal 200. For example, the short form providing module 113 is based on a short form generation model for operating the data classification module 112, and generates a short form corresponding to data in the markup language body based on the learned model to display the short form in the terminal 200. ) can be provided. In other words, the short form providing module 113 can generate and provide short form content that may be of interest to a user holding the terminal 200. In particular, the short form providing module 113 can generate a short form related to content provided on a web page that the user is viewing in real time through the terminal 200 and provide the short form to at least a portion of the web page that the user is viewing in real time. . For another example, the short form providing module 113 generates a short form related to content provided on a web page that the user is viewing in real time through the terminal 200 and pops it up on at least a portion of the web page that the user is viewing in real time. It can be provided in (pop-up) format.

프로세서(110)는 개별 모듈들(111, 112, 113)을 포함할 수 있다. 여기서, 개별 모듈들(111, 112, 113)은 프로세서(110)가 동작하는 기능에 따른 기능 블록을 의미할 수 있다. 구체적으로, 개별 모듈들(111, 112, 113)은 프로세서(110)의 기능에 따라 명칭을 부여한 기능 블록들에 대응할 수 있다.Processor 110 may include individual modules 111, 112, and 113. Here, the individual modules 111, 112, and 113 may refer to functional blocks according to the function in which the processor 110 operates. Specifically, the individual modules 111, 112, and 113 may correspond to functional blocks named according to the function of the processor 110.

본 개시의 실시예에 따른 프로세서(110)는 전자 장치(100) 내의 구성요소들의 동작을 제어하기 위한 알고리즘 또는 알고리즘을 재현한 프로그램에 대한 데이터를 저장하는 메모리(미도시) 및 메모리에 저장된 데이터를 이용하여 전술한 동작을 수행하는 적어도 하나의 기능 블록으로 구현될 수 있다. 이 때, 프로세서(110)와 메모리는 각각 별개의 칩으로 구현될 수 있다. 또는, 프로세서(110)와 메모리는 단일의 칩으로 구현될 수도 있다.The processor 110 according to an embodiment of the present disclosure includes a memory (not shown) that stores data for an algorithm for controlling the operation of components in the electronic device 100 or a program that reproduces the algorithm, and data stored in the memory. It may be implemented with at least one functional block that performs the above-described operation. At this time, the processor 110 and the memory may each be implemented as separate chips. Alternatively, the processor 110 and memory may be implemented as a single chip.

프로세서(110)는 이하의 도 3 내지 도 6에서 설명되는 본 개시에 따른 다양한 실시예들을 전자 장치(100)에서 구현하기 위해 위에서 살펴본 구성요소들 중 어느 하나 또는 복수를 조합하여 제어할 수 있다.The processor 110 may control any one or a combination of a plurality of the above-described components to implement various embodiments according to the present disclosure described in FIGS. 3 to 6 below in the electronic device 100.

실시예에 따른 통신부(미도시)는 외부 장치(예: 도 1의 단말기(200))와 통신을 가능하게 하는 하나 이상의 구성 요소를 포함할 수 있으며, 예를 들어, 방송 수신 모듈, 유선통신 모듈, 무선통신 모듈, 근거리 통신 모듈, 위치정보 모듈 중 적어도 하나를 포함할 수 있다.A communication unit (not shown) according to an embodiment may include one or more components that enable communication with an external device (e.g., the terminal 200 of FIG. 1), for example, a broadcast reception module, a wired communication module. , it may include at least one of a wireless communication module, a short-range communication module, and a location information module.

유선 통신 모듈은, 지역 통신(Local Area Network; LAN) 모듈, 광역 통신(Wide Area Network; WAN) 모듈 또는 부가가치 통신(Value Added Network; VAN) 모듈 등 다양한 유선 통신 모듈뿐만 아니라, USB(Universal Serial Bus), HDMI(High Definition Multimedia Interface), DVI(Digital Visual Interface), RS-232(recommended standard232), 전력선 통신, 또는 POTS(plain old telephone service) 등 다양한 케이블 통신 모듈을 포함할 수 있다. Wired communication modules include various wired communication modules such as Local Area Network (LAN) modules, Wide Area Network (WAN) modules, or Value Added Network (VAN) modules, as well as USB (Universal Serial Bus) modules. ), HDMI (High Definition Multimedia Interface), DVI (Digital Visual Interface), RS-232 (recommended standard 232), power line communication, or POTS (plain old telephone service).

무선 통신 모듈은 와이파이(Wifi) 모듈, 와이브로(Wireless broadband) 모듈 외에도, GSM(global System for Mobile Communication), CDMA(Code Division Multiple Access), WCDMA(Wideband Code Division Multiple Access), UMTS(universal mobile telecommunications system), TDMA(Time Division Multiple Access), LTE(Long Term Evolution), 4G, 5G, 6G 등 다양한 무선 통신 방식을 지원하는 무선 통신 모듈을 포함할 수 있다.In addition to Wi-Fi modules and WiBro (Wireless broadband) modules, wireless communication modules include GSM (global System for Mobile Communication), CDMA (Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), and UMTS (universal mobile telecommunications system). ), TDMA (Time Division Multiple Access), LTE (Long Term Evolution), 4G, 5G, 6G, etc. may include a wireless communication module that supports various wireless communication methods.

근거리 통신 모듈은 근거리 통신(Short range communication)을 위한 것으로서, 블루투스(Bluetooth™), RFID(Radio Frequency Identification), 적외선 통신(Infrared Data Association; IrDA), UWB(Ultra Wideband), ZigBee, NFC(Near Field Communication), Wi-Fi(Wireless-Fidelity), Wi-Fi Direct, Wireless USB(Wireless Universal Serial Bus) 기술 중 적어도 하나를 이용하여, 근거리 통신을 지원할 수 있다.The short-range communication module is for short-range communication and includes Bluetooth™, RFID (Radio Frequency Identification), Infrared Data Association (IrDA), UWB (Ultra Wideband), ZigBee, and NFC (Near Field). Communication), Wi-Fi (Wireless-Fidelity), Wi-Fi Direct, and Wireless USB (Wireless Universal Serial Bus) technology can be used to support short-distance communication.

실시예에 따른 메모리는 전자 장치(100)의 다양한 기능을 지원하는 데이터와, 프로세서(110)의 동작을 위한 프로그램을 저장할 수 있고, 입/출력되는 데이터들(예를 들어, 이미지, 영상 등)을 저장할 있고, 전자 장치(100)에서 구동되는 다수의 응용 프로그램(application program 또는 어플리케이션(application)), 전자 장치(100)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 이러한 응용 프로그램 중 적어도 일부는, 무선 통신을 통해 외부 서버로부터 다운로드 될 수 있다.The memory according to the embodiment may store data supporting various functions of the electronic device 100 and a program for the operation of the processor 110, and input/output data (e.g., images, videos, etc.) can be stored, and a plurality of application programs (application programs or applications) running on the electronic device 100, data for operating the electronic device 100, and commands can be stored. At least some of these applications may be downloaded from an external server via wireless communication.

이러한, 메모리는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), SSD 타입(Solid State Disk type), SDD 타입(Silicon Disk Drive type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(random access memory; RAM), SRAM(static random access memory), 롬(read-only memory; ROM), EEPROM(electrically erasable programmable read-only memory), PROM(programmable read-only memory), 자기 메모리, 자기 디스크 및 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. 또한, 메모리는 전자 장치(100)와는 분리되어 있으나, 유선 또는 무선으로 연결된 데이터베이스가 될 수도 있다.These memories include flash memory type, hard disk type, SSD type (Solid State Disk type), SDD type (Silicon Disk Drive type), and multimedia card micro type. , card-type memory (e.g., SD or It may include at least one type of storage medium among (only memory), PROM (programmable read-only memory), magnetic memory, magnetic disk, and optical disk. Additionally, the memory is separate from the electronic device 100, but may be a database connected wired or wirelessly.

도 2에 도시된 구성 요소들의 성능에 대응하여 적어도 하나의 구성요소가 추가되거나 삭제될 수 있다. 또한, 구성 요소들의 상호 위치는 장치의 성능 또는 구조에 대응하여 변경될 수 있다는 것은 당해 기술 분야에서 통상의 지식을 가진 자에게 용이하게 이해될 것이다.At least one component may be added or deleted in response to the performance of the components shown in FIG. 2. Additionally, it will be easily understood by those skilled in the art that the mutual positions of the components may be changed in response to the performance or structure of the device.

한편, 도 2에서 도시된 각각의 구성요소는 소프트웨어 및/또는 Field Programmable Gate Array(FPGA) 및 주문형 반도체(ASIC, Application Specific Integrated Circuit)와 같은 하드웨어 구성요소를 의미한다.Meanwhile, each component shown in FIG. 2 refers to software and/or hardware components such as Field Programmable Gate Array (FPGA) and Application Specific Integrated Circuit (ASIC).

도 3은 본 개시의 다양한 실시예에 따른 숏폼을 자동으로 생성하는 방법에 관한 개략적인 흐름도이다.Figure 3 is a schematic flowchart of a method for automatically generating a short form according to various embodiments of the present disclosure.

단계 S310에서, 프로세서(예: 도 2의 프로세서(110))는 마크업 언어 본문에서 데이터를 수집할 수 있다. 예를 들어, 프로세서는 콘텐츠 수집 모듈(예: 도 2의 콘텐츠 수집 모듈(111))을 통해 사용자가 단말기(예: 도 1의 단말기(200))를 이용하여 서핑 중인 웹 페이지 내의 데이터를 수집할 수 있다. 여기서, 데이터는 제1 데이터와 제2 데이터가 있을 수 있으며, 제1 데이터와 제2 데이터는 마크업 언어 본문 소스에서 추출할 수 있는 다양한 데이터에 관한 예시이다.In step S310, a processor (e.g., processor 110 in FIG. 2) may collect data from the markup language body. For example, the processor may collect data within a web page that a user is surfing using a terminal (e.g., terminal 200 of FIG. 1) through a content collection module (e.g., content collection module 111 of FIG. 2). You can. Here, the data may include first data and second data, and the first data and second data are examples of various data that can be extracted from the markup language body source.

실시예에 따른 프로세서는 상품을 구매하는 웹 사이트, 뉴스를 제공하는 뉴스 사이트 등의 각종 웹 페이지에 대한 원본 페이지의 마크업 언어 본문 소스를 추출할 수 있다. 이에 따라, 프로세서는 원본 페이지의 마크업 언어 본문 소스를 함께 추출하여 메모리에 저장할 수 있다. 예를 들어, 프로세서는 사용자가 쇼핑을 하고 있는 웹 페이지 내에서 마크업 언어 본문 소스를 추출하여 상품 종류, 상품 이미지, 상품 리뷰, 상품 가격, 상품 상세 정보 등의 다양한 데이터를 확인할 수 있다. 다른 예를 들어, 프로세서는 사용자가 뉴스를 보고 있는 웹 페이지 내에서 마크업 언어 본문 소스를 추출하여 뉴스에 사용된 이미지, 뉴스 내용, 뉴스 댓글, 뉴스 작성 시각 등의 다양한 데이터를 확인할 수 있다. 이처럼, 프로세서는 사용자가 단말기를 통해 확인하고 있는 실시간 마크업 언어 본문 소스로부터 각종 데이터를 수집할 수 있어, 실시간으로 사용자가 관심을 가지고 있는 콘텐츠가 무엇인지 용이하게 파악이 가능할 수 있다.The processor according to the embodiment may extract the markup language text source of the original page for various web pages, such as a website for purchasing products and a news site for providing news. Accordingly, the processor can extract the markup language body source of the original page and store it in memory. For example, the processor can extract the markup language body source from the web page where the user is shopping and check various data such as product type, product image, product review, product price, and product detailed information. For another example, the processor can extract the markup language body source from within the web page where the user is viewing the news and check various data such as images used in the news, news content, news comments, and news creation time. In this way, the processor can collect various data from the real-time markup language text source that the user is checking through the terminal, making it possible to easily determine what content the user is interested in in real time.

단계 S320에서, 프로세서는 데이터에 대응하는 카테고리 및 사운드를 맵핑할 수 있다. 프로세서는 데이터 분류 모듈(예: 도 2의 데이터 분류 모듈(112))을 통해 제1 데이터 및 제2 데이터에 대응하는 카테고리 및 사운드를 맵핑할 수 있다.In step S320, the processor may map categories and sounds corresponding to data. The processor may map categories and sounds corresponding to the first data and the second data through a data classification module (eg, the data classification module 112 of FIG. 2).

실시예예 따른 프로세서는 제1 데이터와 제2 데이터에 포함되어 있는 내용을 파악할 수 있다. 예를 들어, 프로세서는 제1 데이터가 스포츠에 대한 내용인 것을 파악하여 스포츠 카테고리에 제1 데이터를 맵핑할 수 있다. 이어서, 프로세서는 제1 데이터가 스포츠에 대한 내용인 것을 파악하여 스포츠와 매칭되는 사운드에 제1 데이터를 맵핑할 수 있다. 다른 예를 들어, 프로세서는 제2 데이터가 대중가요에 대한 내용인 것을 파악하여 대중가요 카테고리에 제2 데이터를 맵핑할 수 있다. 이어서, 프로세서는 제2 데이터가 대중가요에 대한 내용인 것을 파악하여 대중가요와 매칭되는 사운드에 제2 데이터를 맵핑할 수 있다. 즉, 프로세서는 마크업 언어 본문 소스로부터 수집되는 데이터의 내용을 파악하여 개별 데이터의 카테고리를 분류하고 그 분류에 따라 알맞은 분위기의 사운드를 맵핑할 수 있다. 여기서, 카테고리에 대한 정보와 사운드 정보는 메모리에 기저장되어 있는 것일 수 있으며, 프로세서가 숏폼 생성 모델을 통해 숏폼을 생성하면서 누적적으로 업데이트하여 저장하는 카테고리에 대한 정보와 사운드 정보를 포함할 수 있다.A processor according to an embodiment may determine content included in the first data and the second data. For example, the processor may determine that the first data is about sports and map the first data to the sports category. Next, the processor may determine that the first data is about sports and map the first data to a sound that matches the sports. For another example, the processor may determine that the second data is about pop songs and map the second data to the pop song category. Subsequently, the processor may determine that the second data is about a popular song and map the second data to a sound that matches the popular song. In other words, the processor can identify the content of data collected from the markup language body source, classify the categories of individual data, and map a sound with an appropriate atmosphere according to the classification. Here, information about the category and sound information may be pre-stored in memory, and may include information about the category and sound information that are cumulatively updated and stored as the processor generates the short form through the short form generation model. .

단계 S330에서, 프로세서는 마크업 언어 본문과 관련된 숏폼을 제공할 수 있다. 예를 들어, 프로세서는 숏폼 제공 모듈(예: 도 2의 숏폼 제공 모듈(113))을 통해 적어도 하나의 마크업 언어 본문과 관련된 적어도 하나의 숏폼을 사용자의 단말기로 제공할 수 있다.At step S330, the processor may provide a short form associated with the markup language body. For example, the processor may provide at least one short form related to at least one markup language body to the user's terminal through a short form providing module (e.g., the short form providing module 113 in FIG. 2).

실시예에 따른 프로세서는 사용자가 실시간으로 관심을 두고 있는 콘텐츠에 대응하는 숏폼을 실시간으로 사용자에게 제공할 수 있다. 예를 들어, 프로세서는 사용자가 단말기를 통해 실시간으로 보고 있는 웹 페이지에 제공되고 있는 콘텐츠와 관련된 숏폼을 실시간으로 생성하여 단말기의 디스플레이 적어도 일부에 제공할 수 있다.The processor according to the embodiment may provide the user with a short form corresponding to content that the user is interested in in real time. For example, the processor may generate a short form in real time related to content provided on a web page that the user is viewing in real time through the terminal and provide the short form to at least a portion of the display of the terminal.

도 4는 본 개시의 다양한 실시예에 따른 숏폼 자동 생성 프로세스 모델의 생성 과정에 관한 흐름도이다. 데이터 분류 모듈(예: 도 2의 데이터 분류 모듈(111))은 콘텐츠 수집 모듈(예: 도 2의 콘텐츠 수집 모듈(111))을 통해 수집한 제1 데이터 및 제2 데이터를 구성하는 텍스트를 비교하여 개별 데이터에 포함되어 있는 동일 텍스트 개수를 판단할 수 있다.Figure 4 is a flowchart of the creation process of a short-form automatic generation process model according to various embodiments of the present disclosure. The data classification module (e.g., the data classification module 111 in FIG. 2) compares the text constituting the first data and the second data collected through the content collection module (e.g., the content collection module 111 in FIG. 2). Thus, the number of identical texts included in individual data can be determined.

도 4를 참고하면, 단계 S410에서, 데이터 분류 모듈은 제1 데이터와 제2 데이터를 구성하는 텍스트 중 동일한 텍스트의 개수가 기설정된 개수(예: 3개) 이상인지 판단할 수 있다. 여기서, 기설정된 개수는 메모리에 저장되어 있을 수 있다. 데이터 분류 모듈은, 동일 텍스트 개수가 기설정된 개수 이상인 경우, 단계 S420으로 분기하여 프로세스를 진행할 수 있다. 데이터 분류 모듈은, 동일 텍스트 개수가 기설정된 개수 미만인 경우, 단계 S440으로 분기하여 프로세스를 진행할 수 있다.Referring to FIG. 4, in step S410, the data classification module may determine whether the number of identical texts among the texts constituting the first data and the second data is more than a preset number (eg, 3). Here, the preset number may be stored in memory. If the number of identical texts is more than a preset number, the data classification module may branch to step S420 and proceed with the process. If the number of identical texts is less than a preset number, the data classification module may branch to step S440 and proceed with the process.

단계 S420에서, 데이터 분류 모듈은 제1 데이터와 제2 데이터를 유사 콘텐츠로 분류할 수 있다. 데이터 분류 모듈은 수집된 데이터의 속성 정보에 기초하여 데이터의 유사 여부를 판단할 수 있다. 예를 들어, 데이터 분류 모듈은 뉴스 웹 페이지 내의 데이터에서 제목 텍스트를 수집하여 데이터 간 유사 여부를 판단할 수 있다. 뉴스 웹 페이지에는 다양한 뉴스의 제목이 나열되어 있을 수 있다. 데이터 분류 모듈은 다양한 뉴스의 제목 텍스트를 비교하여 동일한 텍스트가 기설정된 개수 이상인 경우에는 동일한 카테고리에 속하는 뉴스의 제목인 것으로 확인할 수 있다. 다른 예를 들어, 데이터 분류 모듈은 쇼핑몰 웹 페이지 내의 데이터에서 상품 브랜드와 상품명 텍스트를 수집하여 데이터 간 유사 여부를 판단할 수 있다. 쇼핑몰 웹 페이지에는 다양한 상품이 나열되어 있을 수 있다. 데이터 분류 모듈은 다양한 상품의 브랜드와 상품명 텍스트를 비교하여 동일한 텍스트가 기설정된 개수 이상인 경우에는 동일한 카테고리에 속하는 상품인 것으로 확인할 수 있다.In step S420, the data classification module may classify the first data and the second data into similar content. The data classification module can determine whether the data is similar based on attribute information of the collected data. For example, a data classification module can collect title text from data within a news web page and determine whether the data is similar. A news web page may list various news titles. The data classification module compares the title text of various news and, if the number of identical texts is more than a preset number, it can be confirmed that it is the title of a news belonging to the same category. For another example, the data classification module may collect product brand and product name text from data within a shopping mall web page and determine whether the data is similar. A variety of products may be listed on a shopping mall web page. The data classification module compares the brand and product name text of various products and can confirm that the product belongs to the same category if the number of identical texts is more than a preset number.

단계 S430에서, 데이터 분류 모듈은 제1 데이터와 제2 데이터를 제1 카테고리에 맵핑할 수 있다. 데이터 분류 모듈은 제1 데이터와 제2 데이터를 유사한 콘텐츠로 분류하여 제1 데이터와 제2 데이터를 동일한 카테고리에 맵핑할 수 있다.In step S430, the data classification module may map the first data and the second data to the first category. The data classification module may classify the first data and the second data into similar content and map the first data and the second data to the same category.

단계 S440에서, 데이터 분류 모듈은 제1 데이터와 제2 데이터를 비유사 콘텐츠로 분류할 수 있다. 프로세서는 카테고리 데이터베이스(예: 도 2의 카테고리 데이터베이스(120))로부터 제1 카테고리 및 제2 카테고리를 확인할 수 있다. 여기서, 제1 카테고리와 제2 카테고리는 상이한 카테고리이면서, 개별 카테고리는 동일한 클래스인 카테고리일 수 있다. 이에 따라, 제1 카테고리와 제2 카테고리는 각각 하위 클래스인 하위 카테고리를 가질 수 있다.In step S440, the data classification module may classify the first data and the second data as dissimilar content. The processor may check the first category and the second category from a category database (eg, category database 120 of FIG. 2). Here, the first category and the second category may be different categories, and the individual categories may be categories of the same class. Accordingly, the first category and the second category may each have subcategories that are subclasses.

실시예에 따른 데이터 분류 모듈은 제1 데이터와 제2 데이터가 비유사 콘텐츠에 해당하는 데이터인 경우, 제1 데이터를 제1 카테고리에 맵핑하고 제2 데이터를 제2 카테고리에 맵핑할 수 있다. 즉, 데이터 분류 모듈은 제1 데이터와 제2 데이터가 비유사 콘텐츠에 해당하는 데이터로 판단하여 각각의 데이터를 상이한 카테고리에 맵핑할 수 있다. 예를 들어, 데이터 분류 모듈은 뉴스 웹 페이지 상에서의 뉴스 텍스트를 기초로 제1 데이터는 스포츠 뉴스 카테고리로 분류하고, 제2 데이터는 경제 뉴스 카테고리로 분류할 수 있다.When the first data and the second data correspond to dissimilar content, the data classification module according to the embodiment may map the first data to the first category and the second data to the second category. That is, the data classification module may determine that the first data and the second data correspond to dissimilar content and map each data to a different category. For example, the data classification module may classify first data into a sports news category and second data into an economic news category based on news text on a news web page.

실시예에 따른 데이터 분류 모듈은 제1 데이터와 제2 데이터를 각각의 맵핑되는 카테고리에 따라 제1 사운드와 제2 사운드로 맵핑할 수 있다. 이는 데이터가 시각화하여 표현하는 콘텐츠와 어울리는 사운드를 맵핑하는 과정일 수 있다. 프로세서는, 제1 데이터를 제1 카테고리에 맵핑하게 되는 경우, 제1 사운드를 제1 데이터에 맵핑할 수 있다. 이 경우, 제1 사운드는 제1 카테고리에 어울리는 사운드일 수 있다. 다른 예를 들어, 프로세서는, 제2 데이터를 제2 카테고리에 맵핑하게 되는 경우, 제2 사운드를 제2 데이터에 맵핑할 수 있다. 이 경우, 제2 사운드는 제2 카테고리에 어울리는 사운드일 수 있다. 즉, 프로세서는 숏폼을 생성함에 있어서, 상품이나 뉴스 등의 장르, 분위기 별로 사운드를 매칭하여 숏폼을 생성할 수 있다.The data classification module according to the embodiment may map first data and second data into first sound and second sound according to each mapped category. This may be a process of mapping sounds that match the content that data is visualized and expressed. When mapping the first data to the first category, the processor may map the first sound to the first data. In this case, the first sound may be a sound that fits the first category. For another example, when mapping the second data to the second category, the processor may map the second sound to the second data. In this case, the second sound may be a sound that fits the second category. In other words, when creating a short form, the processor can generate the short form by matching sounds according to genre and atmosphere, such as products or news.

도 5는 본 개시의 다양한 실시예에 따른 마크업 언어 본문의 부가 정보를 반영하는 숏폼 생성 프로세스에 관한 흐름도이다.Figure 5 is a flowchart of a short form creation process reflecting additional information of the markup language body according to various embodiments of the present disclosure.

단계 S510에서, 프로세서(예: 도 2의 프로세서(110))는 각각의 데이터에 대응하는 랭크 정보와 마크업 언어 본문에 대응하는 이벤트 정보를 확인할 수 있다.In step S510, the processor (eg, processor 110 of FIG. 2) may check rank information corresponding to each data and event information corresponding to the markup language body.

실시예에 따른 프로세서는 콘텐츠 수집 모듈(예: 도 2의 콘텐츠 수집 모듈(111))을 통해 제1 데이터 및 제2 데이터 각각에 대응하는 랭크 정보 및 적어도 하나의 마크업 언어 본문에 대응하는 이벤트 정보를 확인할 수 있다. 예를 들어, 프로세서는 숏폼 생성 시 상품이 판매되고 있는 웹 페이지나 뉴스의 웹 페이지의 랭크 정보, 해당 웹 페이지에서의 이벤트 정보 등을 부가적으로 수집할 수 있다. 이를 통해, 프로세서는 데이터에 대응하는 랭크 정보와 마크업 언어 본문 소스에서 진행되고 있는 이벤트 정보를 확인한다.The processor according to the embodiment may collect rank information corresponding to each of the first data and the second data and event information corresponding to at least one markup language body through a content collection module (e.g., the content collection module 111 of FIG. 2). You can check. For example, when creating a short form, the processor may additionally collect rank information of the web page where the product is sold or the web page of the news, event information on the web page, etc. Through this, the processor checks the rank information corresponding to the data and the event information in progress in the markup language body source.

실시예에 따른 프로세서는 제1 데이터와 제2 데이터를 각각의 카테고리에 분류하여 학습할 수 있다. 예를 들어, 제1 데이터는 뉴스와 관련된 데이터로 제1 카테고리에 맵핑되고, 자주 갱신되어 저장될 필요가 없을 수 있다. 다른 예를 들어, 제2 데이터는 상품과 관련된 데이터로 제2 카테고리에 맵핑되고, 자주 갱신되어 저장될 필요가 있을 수 있다. 이는 각각의 데이터가 비휘발성 정보인지 휘발성 정보인지에 따라 달라질 수 있다. 즉, 본 개시의 카테고리는 데이터의 성격이나 종류 등에 따라 다양하게 분류될 수 있다. 이에 따라, 프로세서는 숏폼 생성 모델을 각각의 데이터를 분류하는 과정을 통해 생성하고, 숏폼 생성 모델을 기반으로 인풋에 따라 학습하면서 숏폼 생성 모델 자체를 발전시켜 나갈 수 있다. 프로세서는 숏폼 생성 모델을 통해 사용자의 단말기로부터 수신하는 데이터를 누적적으로 적용하고 학습하여 숏폼 생성 모델 자체를 갱신하고, 다시 새로운 데이터에 갱신된 숏폼 생성 모델을 적용할 수 있게 된다.The processor according to the embodiment may classify the first data and the second data into each category and learn them. For example, the first data may be data related to news and is mapped to the first category, and may not need to be frequently updated and stored. For another example, the second data may be data related to a product, mapped to a second category, and may need to be frequently updated and stored. This may vary depending on whether each data is non-volatile information or volatile information. That is, the categories of the present disclosure can be classified in various ways depending on the nature or type of data, etc. Accordingly, the processor can create a short form generation model through the process of classifying each data, and develop the short form generation model itself by learning according to input based on the short form generation model. The processor can cumulatively apply and learn the data received from the user's terminal through the short form generation model, update the short form generation model itself, and then apply the updated short form generation model to new data.

단계 S520에서, 프로세서는 제1 숏폼과 제2 숏폼을 생성할 수 있다. 프로세서는 숏폼 제공 모듈(예: 도 2의 숏폼 제공 모듈(113))을 통해 제1 데이터에 대응하는 랭크 정보 및 제1 데이터를 수집한 제1 마크업 언어 본문 또는 제2 마크업 언어 본문에 대응하는 이벤트 정보를 반영하여, 제1 데이터를 기반으로 한 제1 숏폼을 생성할 수 있다. 프로세서는 숏폼 제공 모듈을 통해 제2 데이터에 대응하는 랭크 정보 및 제2 데이터를 수집한 제1 마크업 언어 본문 또는 제2 마크업 언어 본문에 대응하는 이벤트 정보를 반영하여, 제2 데이터를 기반으로 한 제2 숏폼을 생성할 수 있다.In step S520, the processor may generate a first short form and a second short form. The processor responds to the first markup language body or the second markup language body that collected the rank information and first data corresponding to the first data through the short form provision module (e.g., the short form provision module 113 in FIG. 2). The first short form based on the first data can be created by reflecting the event information. The processor reflects the rank information corresponding to the second data and the first markup language body that collected the second data or event information corresponding to the second markup language body through the short form provision module, and generates information based on the second data. A second short form can be created.

도 6은 본 개시의 다양한 실시예에 따른 숏폼 자동 생성 프로세스 모델을 통한 숏폼 생성에 관한 상세한 흐름도이다.Figure 6 is a detailed flowchart of short form creation through a short form automatic creation process model according to various embodiments of the present disclosure.

도 6을 참고하면, 단계 S610에서, 프로세서(예: 도 2의 프로세서(110))는 데이터를 구성하는 텍스트의 키워드를 확인할 수 있다. 프로세서는 제1 데이터 및 제2 데이터를 구성하는 텍스트를 기반으로 키워드를 확인할 수 있다.Referring to FIG. 6, in step S610, a processor (eg, processor 110 of FIG. 2) may check keywords of text constituting data. The processor may check the keyword based on the text constituting the first data and the second data.

실시예에 따르면, 프로세서는 기설정된 제1 정규식을 이용하여 제1 데이터 및 제2 데이터에 대한 제1 키워드를 추출할 수 있다. 제1 키워드 추출 이후, 프로세서는 기설정된 제2 정규식을 이용하여 제2 키워드를 추출할 수 있다. According to an embodiment, the processor may extract the first keyword for the first data and the second data using a preset first regular expression. After extracting the first keyword, the processor may extract the second keyword using a preset second regular expression.

단계 S620에서, 프로세서는 제n 정규식을 통해 제n 키워드를 추출할 수 있다. 예를 들어, 프로세서는 기설정된 제n 정규식을 이용하여 데이터에 대한 제n 키워드를 추출할 수 있다. 예를 들어, 기설정된 정규식은 사용자의 단말기(예: 도 1의 단말기(200))를 통해 사용자가 접근하게 되는 도메인이나 방문 사이트 등을 기초로 해당 도메인이나 사이트의 마크업 언어 본문 소스로부터 데이터를 추출하기 위한 식일 수 있다.In step S620, the processor may extract the nth keyword through the nth regular expression. For example, the processor may extract the n-th keyword for data using a preset n-th regular expression. For example, a preset regular expression collects data from the markup language body source of the domain or site based on the domain or visited site that the user accesses through the user's terminal (e.g., terminal 200 in FIG. 1). It may be a formula for extraction.

실시예에 따르면, 제1 정규식은 마크업 언어 본문에서 A~D 까지의 키워드 정보를 a로 해석하라는 약속에 따른 식일 수 있다. 프로세서는 제1 정규식을 이용하여 숏폼 생성을 위한 필수 정보를 추출할 수 있다. 여기서, 필수 정보는 숏폼을 생성하기 위해 필요한 속성 정보일 수 있다. 속성 정보는, 예를 들어, 상품의 경우라면, 상품의 카테고리, 상품명, 상품의 이미지, 상품의 가격 등일 수 있다. 다른 예를 들어, 웹 소설의 경우라면, 웹소설의 제목, 웹소설의 이미지 등일 수 있다. 또 다른 예를 들어, 뉴스의 경우라면, 뉴스의 제목, 뉴스의 이미지, 뉴스의 댓글 등일 수 있다.According to an embodiment, the first regular expression may be an expression based on a promise to interpret keyword information from A to D in the markup language body as a. The processor can extract essential information for short form creation using the first regular expression. Here, the required information may be attribute information needed to create a short form. For example, in the case of a product, the attribute information may be the product category, product name, product image, product price, etc. For another example, in the case of a web novel, it may be the title of the web novel, the image of the web novel, etc. For another example, in the case of news, it may be the title of the news, the image of the news, the comments of the news, etc.

실시예에 따르면, 제2 정규식은 제1 정규식에서 추출한 데이터보다 일반적인 데이터를 추출하기 위한 식일 수 있다. 예를 들어, 제1 정규식을 통해 제1 데이터에서 제1 키워드를 추출하는 것은 제1 마크업 언어 본문에서 유니크한 키워드에 포함된 정보를 추출하는 것일 수 있다. 예를 들어, 유니크한 키워드는 아이템 코드, 아이템 넘버, 상품 코드 등일 수 있다. 다른 예를 들어, 제2 정규식을 통해 제1 데이터에서 제2 키워드를 추출하는 것은 제1 마크업 언어 본문에서 제1 정규식을 통해 추출한 데이터보다 일반적으로 사용하는 키워드를 추출하는 것일 수 있다. 이후, 제3 정규식을 통한 제3 키워드 추출은 제2 정규식을 통한 제2 키워드 추출보다 일반적인 키워드를 추출하는 것일 수 있다. 즉, 제1 키워드, 제2 키워드, 제3 키워드, 제n 키워드는 n이 커질수록 일반적인 키워드에 해당할 수 있다.According to an embodiment, the second regular expression may be an expression for extracting data that is more general than data extracted from the first regular expression. For example, extracting a first keyword from first data through a first regular expression may mean extracting information included in a unique keyword from the first markup language text. For example, a unique keyword may be an item code, item number, product code, etc. For another example, extracting a second keyword from first data through a second regular expression may mean extracting a keyword that is more commonly used than data extracted from the first markup language body through a first regular expression. Thereafter, extracting the third keyword through the third regular expression may extract more general keywords than extracting the second keyword through the second regular expression. That is, the first keyword, second keyword, third keyword, and nth keyword may correspond to general keywords as n increases.

실시예에 따르면, 제n 키워드를 추출하기 위한 제n 정규식 중 n이 3 이상인 경우부터는 else의 영역으로 분류될 수 있다. 예를 들어, else는 아주 일반적으로 사용자가 접할 수 있는 키워드를 추출하는 영역에 해당할 수 있다.According to an embodiment, if n is 3 or more among the nth regular expressions for extracting the nth keyword, it may be classified into the else area. For example, else may correspond to an area that extracts keywords that users may encounter in very general terms.

단계 S630에서, 프로세서는 키워드의 추출 빈도가 기설정된 추출 빈도 이상인지 여부를 판단할 수 있다. 프로세서는, 제1 정규식 및 제2 정규식을 통해 제1 키워드 및 제2 키워드를 추출한 이후, 제3 정규식을 이용하여 추출한 제3 키워드 중 추출 빈도가 기설정된 빈도 이상인 키워드를 판단할 수 있다. 기설정된 빈도(예: 3회)는 메모리에 저장되어 있을 수 있다. 기설정된 빈도 이상으로 추출되는 제3 키워드는 사용자에게 제공될 수 있다.In step S630, the processor may determine whether the extraction frequency of the keyword is greater than or equal to a preset extraction frequency. After extracting the first and second keywords using the first and second regular expressions, the processor may determine which of the third keywords extracted using the third regular expression has an extraction frequency greater than or equal to a preset frequency. A preset frequency (e.g., 3 times) may be stored in memory. A third keyword extracted at a frequency exceeding a preset level may be provided to the user.

단계 S640에서, 프로세서는 별도의 키워드 데이터로 제공할 수 있다. 프로세서는 제3 키워드 중 추출 빈도가 기설정된 빈도 이상인 키워드를 별도의 키워드 데이터로 제공할 수 있다. 이는 제3 정규식 이상부터를 else 영역으로 하여 로그에서 자주 등장하는 내용을 개발자(예: 관리자)에게 제공하고, 개발자가 딕셔너리 데이터베이스(예: 도 2의 딕셔너리 데이터베이스(130))에 저장하거나 자동적으로 업데이트되어 딕셔너리 데이터베이스에 저장되는 경우, 해당 내용을 키워드로 인식하여 데이터로부터 속성 정보를 추출하도록 하는 프로세스일 수 있다. 일 예를 들어, 개발자가 딕셔너리 데이터베이스에 로그 정보를 저장한 경우, 프로세서는 로그 정보를 통해 키워드로 인식한 데이터를 추출할 수 있다. 다른 예를 들어, 프로세서가 자동적으로 업데이트하여 딕셔너리 데이터베이스에 로그 정보를 저장한 경우, 프로세서는 로그 정보를 통해 키워드로 인식한 데이터를 추출할 수 있다. 이와 같이, 모든 키워드 데이터들은 딕셔너리 데이터베이스에 저장될 수 있다. 프로세서는 키워드 추출 프로세스를 통해 얻은 결과를 기반으로 다시 정규식을 만들어, 만든 정규식을 통해 다시 키워드를 추출하는 프로세스를 반족적으로 수행하여 보다 정확한 숏폼 생성 모델을 구축할 수 있다.In step S640, the processor may provide separate keyword data. The processor may provide keywords whose extraction frequency is higher than a preset frequency among the third keywords as separate keyword data. This provides the developer (e.g., administrator) with content that frequently appears in the log by using the third regular expression or higher as the else area, and the developer stores it in a dictionary database (e.g., the dictionary database 130 in FIG. 2) or automatically updates it. When stored in a dictionary database, it may be a process of recognizing the content as a keyword and extracting attribute information from the data. For example, if a developer stores log information in a dictionary database, the processor can extract data recognized as a keyword through the log information. For another example, if the processor automatically updates and stores log information in the dictionary database, the processor can extract data recognized as keywords through the log information. In this way, all keyword data can be stored in the dictionary database. The processor can create a regular expression again based on the results obtained through the keyword extraction process, and sequentially perform the process of extracting keywords again through the created regular expression to build a more accurate short form generation model.

단계 S650에서, 프로세서는 별도의 키워드 데이터를 딕셔너리 데이터베이스에 저장할 수 있다. 프로세서는 제3 키워드 이상의 키워드 데이터들(예: 제3 키워드, 제4 키워드 등)를 딕셔너리 데이터베이스에 저장할 수 있다. 프로세서는 추출하는 데이터에 해당하는 속성 정보들이 올바른 정보에 해당하는지를 별도로 체크할 수 있다. 예를 들어, 프로세서는 상품의 가격 정보와 관련하여, 마크업 언어 본문 소스로부터 추출된 가격 정보가 제대로된 가격 정보를 추출한 것이 맞는 것인지 확인하기 위하여 기저장된 가격을 나타내는 정보에 매칭시킬 수 있다. 이를 통해, 프로세서는 반복적인 매칭 과정 없이도 상품의 가격 정보에 대한 초회의 매칭을 기반으로 속성 정보를 추출하여 맵핑할 수 있다.In step S650, the processor may store separate keyword data in a dictionary database. The processor may store keyword data beyond the third keyword (e.g., third keyword, fourth keyword, etc.) in the dictionary database. The processor can separately check whether the attribute information corresponding to the extracted data corresponds to correct information. For example, in relation to the price information of a product, the processor may match the price information extracted from the markup language body source with information representing the pre-stored price to confirm whether the correct price information has been extracted. Through this, the processor can extract and map attribute information based on initial matching to product price information without a repetitive matching process.

단계 S660에서, 프로세서는 제4 정규식으로 결정할 수 있다. 프로세서는 별도의 키워드 데이터가 딕셔너리 데이터베이스에 저장되는 경우, 별도의 키워드를 추출하는 정규식을 제4 정규식으로 결정할 수 있다. 여기서, 제4 정규식은 제3 정규식 이후의 정규식을 의미하는 예시이다.In step S660, the processor may determine the fourth regular expression. When separate keyword data is stored in a dictionary database, the processor may determine the regular expression for extracting the separate keyword as the fourth regular expression. Here, the fourth regular expression is an example that refers to the regular expression after the third regular expression.

실시예에 따르면, 프로세서는 마크업 언어 본문으로부터 추출한 데이터의 속성 정보 확인을 위해 정규식의 else 영역을 활용할 수 있다. 프로세서는 else 영역을 통해 로그에 저장된 정보 중 기설정된 빈도 이상으로 추출되는 정보를 딕셔너리 데이터베이스에 저장하여 향후 속성 정보 확인 시 키워드로 활용할 수 있다.According to an embodiment, the processor may utilize the else area of the regular expression to check attribute information of data extracted from the markup language body. The processor can store information extracted more than a preset frequency among the information stored in the log through the else area in the dictionary database and use it as a keyword when checking attribute information in the future.

실시예에 따르면, 프로세서는 데이터의 속성 정보 확인 과정에서 딕셔너리 데이터베이스에 저장되어 있는 정보 중 반복적으로 저장 또는 확인되는 정보를 개발자에게 추천할 수 있다. 이는, 프로세서가 학습을 통해 반복하여 처음에는 추천하지 않았으나 다양한 사이트를 방문하여 학습도가 올라갈수록 추천의 빈도가 높아지는 프로세스에 해당할 수 있다.According to an embodiment, in the process of checking data attribute information, the processor may recommend to the developer information that is repeatedly stored or confirmed among the information stored in the dictionary database. This may correspond to a process in which the processor repeats learning and does not recommend at first, but the frequency of recommendations increases as the learning level increases by visiting various sites.

실시예에 따르면, 프로세서는 데이터가 맵핑되는 사운드와 데이터를 기반으로 생성한 숏폼과의 매칭 적절성을 판단할 수 있다. 프로세서는 카테고리를 기준으로 자동 또는 수동으로 사운드를 생성할 수 있다. 일 예를 들어, 프로세서는 카테고리 키워드 속성을 기준으로 사운드 데이터를 수집할 수 있다. 여기서, 사운드 데이터는 저작권에 저촉되지 않는 사운드 데이터일 수 있다. 다른 예를 들어, 프로세서는 카테고리 키워드 속성을 기준으로 사운드 데이터를 직접 생성할 수 있다. 또 다른 예를 들어, 프로세서는 개발자에 의해 저장되는 사운드 데이터를 로드할 수 있다.According to an embodiment, the processor may determine the appropriateness of matching a sound to which data is mapped and a short form generated based on the data. The processor can automatically or manually generate sounds based on categories. For example, the processor may collect sound data based on category keyword attributes. Here, the sound data may be sound data that does not violate copyright. As another example, the processor may directly generate sound data based on category keyword attributes. As another example, the processor may load sound data stored by the developer.

실시예에 따르면, 프로세서는 숏폼 생성을 위한 영상 데이터와 사운드 데이터를 매칭할 수 있다. 일 예를 들어, 프로세서는 기본 가중치를 1/10로 하고, 영상 데이터의 인기도(예: 좋아요, 댓글 수 등)에 따른 가중치를 4/10 내지 8/10으로 설정할 수 있다. 다른 예를 들어, 프로세서는 영상 데이터의 시청 시간에 따른 가중치를 2/10으로 부여하고, 영상 데이터 자동 매칭의 한계를 조절하기 위하여 1/10 내에서 개발자(예: 관리자)의 선택권 부여에 따른 가중치를 부여할 수 있다. 이에 따라, 프로세서는 생성하게 되는 숏폼에 대하여 부여되는 전체 가중치를 기반으로 사운드 데이터를 n개 구간(예: 10개 구간)으로 나누어 랜덤하게 사운드 데이터를 매칭시킬 수 있다. 예를 들어, 55회의 숏폼 자동 생성 요청이 수신되는 경우, 프로세서는 10회는 제10 구간 내의 사운드 데이터 중 랜덤하게 선택하여 사운드 데이터를 적용하고, 9회는 제9 구간 내의 사운드 데이터 중 랜덤하게 선택하여 사운드 데이터를 적용하고, 1회는 제1 구간 내의 사운드 데이터 중 랜덤하게 선택하여 사운드 데이터를 적용할 수 있다. 이와 같이, 프로세서는 개별 구간의 사운드 데이터 개수를 부여된 가중치에 따라 동일한 숫자로 유지하면서, 1일 1회씩 가중치를 갱신할 수 있다. 또한, 프로세서는 가중치가 일치할 경우, 최근 사용 순서, 최신 생성 순더 등으로 우선 순위를 정렬하여 구분할 수 있다.According to an embodiment, the processor may match video data and sound data for short form creation. For example, the processor may set the basic weight to 1/10 and set the weight according to the popularity of the video data (e.g., number of likes, comments, etc.) to 4/10 to 8/10. For another example, the processor assigns a weight of 2/10 according to the viewing time of video data, and in order to adjust the limit of automatic video data matching, the weight is based on the developer's (e.g. administrator) choice within 1/10. can be given. Accordingly, the processor can randomly match the sound data by dividing the sound data into n sections (e.g., 10 sections) based on the total weight given to the short form to be generated. For example, when 55 requests for automatic short form creation are received, the processor randomly selects and applies sound data from the sound data in the 10th section 10 times, and randomly selects sound data from the sound data within the 9th section 9 times. Then, the sound data can be applied, and the sound data can be applied by randomly selecting one of the sound data within the first section. In this way, the processor can update the weight once a day while maintaining the same number of sound data in each section according to the assigned weight. Additionally, if the weights match, the processor can sort the priorities by order of most recently used, most recently created, etc.

실시예에 따른 프로세서는 제1 마크업 언어 본문 또는 제2 마크업 언어 본문 외의 다른 마크업 언어 본문으로부터 제3 데이터를 수집하는 경우, 상기 제1 데이터 및 상기 제2 데이터를 통한 상기 제1 숏폼 및 상기 제2 숏폼 생성 과정에서 상기 카테고리 데이터베이스 및 상기 딕셔너리 데이터베이스에 누적적으로 저장되는 정보를 기반으로 상기 다른 마크업 언어 본문으로부터 수집되는 상기 제3 데이터에 대응하는 제3 숏폼을 제공할 수 있다. 즉, 프로세서는 제1 숏폼과 제2 숏폼을 생성하는 방식을 동일 및/또는 유사하게 적용하여 제3 숏폼을 생성할 수 있다. 이는 프로세서가 머신 러닝을 통해 숏폼 생성 모델을 반복적으로 학습하여 일부 사이트의 웹 페이지에서 숏폼을 생성한 과정을 기반으로 새로운 사이트의 웹 페이지에서 동일 및/또는 유사하게 숏폼을 생성하는 것일 수 있다.When the processor according to the embodiment collects third data from a markup language body other than the first markup language body or the second markup language body, the first short form and the first short form through the first data and the second data In the process of creating the second short form, a third short form corresponding to the third data collected from the other markup language body can be provided based on information accumulated in the category database and the dictionary database. That is, the processor may generate the third short form by applying the same and/or similar methods for generating the first and second short forms. This may be that the processor repeatedly learns the short form generation model through machine learning to generate the same and/or similar short forms on the web pages of a new site based on the process of generating short forms on the web pages of some sites.

한편, 개시된 실시예들은 컴퓨터에 의해 실행 가능한 명령어를 저장하는 기록매체의 형태로 구현될 수 있다. 명령어는 프로그램 코드의 형태로 저장될 수 있으며, 프로세서에 의해 실행되었을 때, 프로그램 모듈을 생성하여 개시된 실시예들의 동작을 수행할 수 있다. 기록매체는 컴퓨터로 읽을 수 있는 기록매체로 구현될 수 있다.Meanwhile, the disclosed embodiments may be implemented in the form of a recording medium that stores instructions executable by a computer. Instructions may be stored in the form of program code, and when executed by a processor, may create program modules to perform operations of the disclosed embodiments. The recording medium may be implemented as a computer-readable recording medium.

컴퓨터가 읽을 수 있는 기록매체로는 컴퓨터에 의하여 해독될 수 있는 명령어가 저장된 모든 종류의 기록 매체를 포함한다. 예를 들어, ROM(Read Only Memory), RAM(Random Access Memory), 자기 테이프, 자기 디스크, 플래쉬 메모리, 광 데이터 저장장치 등이 있을 수 있다. Computer-readable recording media include all types of recording media storing instructions that can be decoded by a computer. For example, there may be Read Only Memory (ROM), Random Access Memory (RAM), magnetic tape, magnetic disk, flash memory, optical data storage device, etc.

이상에서와 같이 첨부된 도면을 참조하여 개시된 실시예들을 설명하였다. 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자는 본 개시의 기술적 사상이나 필수적인 특징을 변경하지 않고도, 개시된 실시예들과 다른 형태로 본 개시가 실시될 수 있음을 이해할 것이다. 개시된 실시예들은 예시적인 것이며, 한정적으로 해석되어서는 안 된다.As described above, the disclosed embodiments have been described with reference to the attached drawings. A person skilled in the art to which this disclosure pertains will understand that the present disclosure may be practiced in forms different from the disclosed embodiments without changing the technical idea or essential features of the present disclosure. The disclosed embodiments are illustrative and should not be construed as limiting.

10: 숏폼 자동 생성 시스템
100: 전자 장치
200: 단말기
110: 프로세서
111: 콘텐츠 수집 모듈
112: 데이터 분류 모듈
113: 숏폼 제공 모듈
120: 카테고리 데이터베이스
130: 딕셔너리 데이터베이스10: Short form automatic creation system
100: electronic device
200: terminal
110: processor
111: Content collection module
112: Data classification module
113: Short form provided module
120: Category database
130: Dictionary database

Claims

In an electronic device that automatically creates a short form based on data collected through artificial intelligence,
category database;
content collection module;
data classification module; and
Includes a processor that controls the operation of the short-form provision module,
The processor,
Collect first data and second data from at least one markup language body through the content collection module,
Mapping categories and sounds corresponding to the first data and the second data through the data classification module,
Set to provide at least one short form related to the at least one markup language body through the short form providing module,
Through the content collection module, check rank information corresponding to each of the first data and the second data and event information corresponding to the at least one markup language body,
Through the short form provision module, the rank information corresponding to the first data and the event information corresponding to the first markup language body or the second markup language body that collected the first data are reflected, and the first data Create the first short form based on
Through the short form provision module, the rank information corresponding to the second data and the event information corresponding to the first markup language body or the second markup language body that collected the second data are reflected, and the second data Create a second short form based on
An electronic device that automatically generates a short form, configured to provide the first short form and the second short form through the short form providing module.

According to paragraph 1,
The processor,
Confirming the number of identical texts of the text constituting the first data and the text constituting the second data through the data classification module,
If the number of confirmed identical texts is more than a preset number, classify the first data and the second data as similar content through the data classification module,
A short-form automatically generating electronic device configured to classify the first data and the second data as dissimilar content through the data classification module when the number of confirmed identical texts is less than a preset number.

According to paragraph 2,
The processor,
Confirming the first category and the second category from the category database,
If the first data and the second data are similar content, mapping the first data and the second data to the first category through the data classification module,
When the first data and the second data are dissimilar contents, the data classification module is set to map the first data to the first category and map the second data to the second category,
The first category and the second category are different categories, and each category includes subcategories of the same class.

According to paragraph 3,
The processor,
Map the first sound to the first data according to the category to which the first data is mapped through the data classification module,
A short-form automatically generating electronic device configured to map a second sound to the second data according to a category to which the second data is mapped through the data classification module.

delete

According to paragraph 1,
The processor,
Confirming keywords based on the text constituting the first data and the second data,
Extracting a first keyword for the first data and the second data using a preset first regular expression,
A short-form automatically generating electronic device configured to extract a second keyword using a preset second regular expression after extracting the first keyword.

According to clause 6,
It further includes a dictionary database,
The processor,
Among the third keywords extracted using the preset third regular expression, keywords with an extraction frequency higher than the preset frequency are provided as separate keyword data,
A short-form automatically generating electronic device configured to store the separate keyword data in the dictionary database.

In clause 7,
The processor,
A short-form automatically generating electronic device configured to determine a regular expression for extracting the separate keyword as a fourth regular expression when the separate keyword data is stored in the dictionary database.

According to clause 8,
The processor,
When collecting third data from a markup language body other than the first markup language body or the second markup language body, generating the first short form and the second short form through the first data and the second data An electronic device that automatically generates a short form, set to provide a third short form corresponding to the third data collected from the different markup language body based on information stored cumulatively in the category database and the dictionary database in the process.

In a method of automatically creating a short form based on data collected through artificial intelligence,
collecting first data and second data from at least one markup language body through a content collection module;
Mapping categories and sounds corresponding to the first data and the second data through a data classification module; and
Providing at least one short form related to the at least one markup language body through a short form providing module,
Through the content collection module, check rank information corresponding to each of the first data and the second data and event information corresponding to the at least one markup language body,
Through the short form provision module, the rank information corresponding to the first data and the event information corresponding to the first markup language body or the second markup language body that collected the first data are reflected, and the first data Create the first short form based on
Through the short form provision module, the rank information corresponding to the second data and the event information corresponding to the first markup language body or the second markup language body that collected the second data are reflected, and the second data Create a second short form based on
A method for automatically generating a short form, characterized in that it is set to provide the first short form and the second short form through the short form providing module.