KR20210120203A

KR20210120203A - Method for generating metadata based on web page

Info

Publication number: KR20210120203A
Application number: KR1020200036591A
Authority: KR
Inventors: 현윤아; 최주은; 김나운
Original assignee: 엔에이치엔 주식회사; 엔에이치엔애드 (주)
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2021-10-07

Abstract

According to an embodiment of the present invention, a method for generating metadata based on a webpage, which is generated by a metadata generating application executed in a processor of a computing device, comprises: a step of extracting text included in the webpage; a step of extracting text based on letters in an image included in the webpage; a step of performing natural language processing for the extracted text to extract part of the extracted text as valid text; a step of calculating importance of the extracted valid text and determining part of the extracted valid text as a core keyword according to the calculated importance; and a step of generating the metadata based on the determined core keyword.

Description

How to generate metadata based on a web page

본 발명은 웹 페이지(Web page)에 기반한 메타데이터(Metadata) 생성방법에 관한 것이다. 보다 상세하게는, 웹 페이지 내의 텍스트(Text)와 이미지(Image)에 기반하여 메타데이터를 생성하는 방법에 관한 것이다. The present invention relates to a method of generating metadata based on a web page. More particularly, it relates to a method of generating metadata based on text and images in a web page.

웹　사이트에서 검색에 필요한　데이터를 수집하는 크롤링(crawling)은 큰 이슈 중에 하나이다. 크롤링은 무수히 많은　웹　사이트의 각　페이지에서 제공하는 텍스트를 수집하여 검색 대상의 색인으로 포함시키는 기술을 일컫는다.　Crawling, which collects data required for search on a web site, is one of the major issues. Crawling refers to a technology that collects texts provided by each page of countless web sites and includes them as an index for search targets.

웹 페이지 크롤링 시, 페이지의 전체 내용을 본문으로서 저장할 경우 중복 메뉴, 광고, 불필요한　태그 등이 같이 저장되어 검색 시 불필요한 검색 결과가 제공될 수 있으므로 검색　데이터로서의 효용성이 떨어진다.When crawling a web page, if the entire content of the page is saved as the main body, duplicate menus, advertisements, unnecessary tags, etc. are stored together, and unnecessary search results may be provided when searching, so the utility as search data is low.

그래서　페이지의 본문 내용만 자동으로　추출하기 위한 필터링 알고리즘이 개발되어 왔지만 어느 하나의 필터링만으로는 아직 부족한 부분이 있으며, 이를 해결하기 위해 다양한 필터를 결합하여 사용하고 있다. Therefore, a filtering algorithm has been developed to automatically extract only the body content of a page, but there is still a lack of any one filtering alone, and to solve this problem, various filters are used in combination.

그럼에도 불구하고 검색어를 입력하여 검색한 경우에 검색 대상과는 무관한 부분에서 크롤링된　데이터에 의한 검색 결과가 제공되어 만족스러운 검색 효율을 얻지 못하는 실정이다. Nevertheless, when a search term is entered and searched, a search result based on crawled data is provided in a part unrelated to the search target, so that satisfactory search efficiency cannot be obtained.

또한, 종래의 알고리즘은, 웹 사이트 관리자가 여러 페이지의 태그 속성에 엑세스할 수 있도록 하고, 검색에 필요한　데이터인 메타데이터를 편집할 수 있는 관리 시스템을 제공하고 있으나, 해당 시스템은 웹 사이트 관리자가 SEO(검색엔진최적화)를 위해 적절한 메타데이터를 지정할 때 유용할 뿐, 메타데이터를 일일이 작성해야 하고, 페이지를 업데이트할 때마다 새로 편집해야 하는 불편함이 있다. In addition, the conventional algorithm provides a management system that allows web site administrators to access tag attributes of multiple pages and edit metadata, which is data required for search, but the system allows web site administrators to access SEO. It is only useful when specifying appropriate metadata for (search engine optimization), but there is the inconvenience of having to write metadata one by one and edit it every time the page is updated.

또한, SEO를 위해서는 실제로 웹 페이지와 연관 있는 메타데이터를 작성하는 것이 중요하나, 웹 페이지 내에 텍스트뿐만 아니라 이미지까지 존재하는 경우, 해당 이미지 내의 글자까지 모두 고려하여 의미 있는 키워드를 도출해주는 시스템이 미비한 실정이다. In addition, for SEO, it is important to actually create metadata related to a web page, but if not only text but also images exist in a web page, a system that derives meaningful keywords by considering all the characters in the image is insufficient. am.

그러므로, 웹 페이지의 이미지 내에 존재하는 텍스트까지 모두 추출하고, 추출된 웹 페이지 내 모든 텍스트에 기반하여 메타데이터를 생성할 수 있는 기술에 대한 개발이 요구되고 있다. Therefore, there is a need to develop a technology capable of extracting all texts in an image of a web page and generating metadata based on all texts in the extracted web page.

한편, 위와 같은 크롤링에 대한 관심이 증대되면서, 웹 페이지 문서의 핵심 키워드를 도출 및 요약하는 텍스트랭크(TextRank) 알고리즘이 개발되었다. Meanwhile, as interest in crawling has increased, a TextRank algorithm for deriving and summarizing key keywords of web page documents has been developed.

여기서, 텍스트랭크(TextRank) 알고리즘이란, 웹 페이지로부터 핵심 키워드를 도출하고 요약하는 기능을 수행하는 알고리즘이다. Here, the TextRank algorithm is an algorithm that performs a function of deriving and summarizing key keywords from a web page.

자세히, 텍스트랭크 알고리즘은, 워드 그래프(Word graph) 또는 문장 그래프(Sentence graph)를 구축한 뒤, 그래프 랭킹(Graph ranking) 알고리즘인 페이지랭크(PageRank)를 이용하여 각각의 키워드 및/또는 핵심 문장을 선택할 수 있다. 그리고 텍스트랭크 알고리즘은, 선택된 키워드 및/또는 핵심 문장을 이용하여 주어진 웹 페이지의 텍스트 집합을 요약할 수 있다. In detail, the TextRank algorithm builds a word graph or a sentence graph, and then uses PageRank, a graph ranking algorithm, to rank each keyword and/or key sentence. You can choose. And the TextRank algorithm may summarize the text set of a given web page using the selected keywords and/or key sentences.

이때, 페이지랭크는, Brin and Page(1998)이 제안한 알고리즘으로 하이퍼링크를 가지는 웹 문서에 상대적 중요도에 따라 가중치를 부여하는 방법이다. 페이지랭크가 높은 웹 페이지는 다른 웹 사이트로 부터 링크를 많이 받은 것, 즉 다른 사이트가 참조를 많이한 것으로 해석할 수 있다. In this case, PageRank, an algorithm proposed by Brin and Page (1998), is a method of assigning weights to web documents having hyperlinks according to their relative importance. A web page with a high PageRank can be interpreted as receiving a lot of links from other websites, that is, a lot of references from other websites.

이러한 페이지랭크 알고리즘을 활용한 것이 바로 텍스트랭크이며, 텍스트랭크는　페이지랭크의　중요도가 높은 웹 사이트는 다른 많은 사이트로부터 링크를 받는다는 점에 착안하여 웹 페이지 문서 내의 단어 및/또는 문장을 이용하여 중요도에 따른 랭킹(Ranking)을 계산하는 알고리즘이다. It is TextRank that utilizes such a PageRank algorithm, and TextRank uses words and/or sentences in a web page document to determine the importance of a website with a high 　 PageRank 　 importance based on the fact that it receives links from many other sites. It is an algorithm that calculates the following ranking.

다른 한편, 정보통신 기술의 발전과 이미지 분석에 대한 관심이 증가하면서, 이미지 내 글자를 감지 가능한 구글 클라우드의 비전 API 모델(Google cloud vision API model)이 개발되었다. On the other hand, with the development of information and communication technology and increasing interest in image analysis, a Google cloud vision API model capable of detecting characters in an image has been developed.

상세히, 비전 API 모델은, REST 및 RPC API를 통해 선행 학습된 강력한 머신러닝 모델을 제공하며, 이미지에 라벨을 할당하고 사전 정의된 수백만 개의 카테고리로 빠르게 분류할 수 있다. Specifically, Vision API Models provides powerful machine learning models pre-trained via REST and RPC APIs, allowing you to label images and quickly classify them into millions of predefined categories.

이러한 비전 API 모델은, 이미지 내의 객체나 인쇄 또는 필기 텍스트 등을 감지할 수 있으며, 이로부터 유용한 데이터를 추출하여 이미지에 기반한 메타데이터 생성 프로세스를 보조할 수 있다. Such a vision API model can detect an object in an image, printed or handwritten text, and the like, and extract useful data from it to assist the image-based metadata creation process.

KR 10-2017-0094829 AKR 10-2017-0094829 A

본 발명은, 상술된 문제점을 해결하기 위해 안출된 것으로서, 웹 페이지의 텍스트, 이미지 내 글자 및 이미지에 대한 대체 텍스트 중 어느 하나 이상을 기반으로 크롤링(crawling)을 수행하여, 상기 웹 페이지에 대한 속성을 설명하는 메타데이터를 생성하는 웹 페이지 기반 메타데이터 생성방법을 제공하는데 그 목적이 있다. The present invention has been devised to solve the above-described problems, and by performing crawling based on any one or more of text of a web page, text in images, and alternative text for images, properties of the web page An object of the present invention is to provide a web page-based metadata generation method that generates metadata that describes

또한, 본 발명은, 웹 페이지로부터 크롤링된 텍스트의 속성을 기반으로 해당 웹 페이지에 대한 메타데이터를 생성하는 웹 페이지 기반 메타데이터 생성방법을 제공하고자 한다. Another object of the present invention is to provide a web page-based metadata generation method for generating metadata for a corresponding web page based on attributes of text crawled from the web page.

다만, 본 발명 및 본 발명의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. However, the technical problems to be achieved by the present invention and embodiments of the present invention are not limited to the technical problems as described above, and other technical problems may exist.

본 발명의 실시예에 따른 웹 페이지에 기반한 메타데이터 생성방법은, 컴퓨팅 디바이스의 프로세서에서 실행되는 메타데이터 생성 어플리케이션이 웹 페이지에 기반한 메타데이터 생성하는 방법으로서, 상기 웹 페이지에 포함된 텍스트를 추출하는 단계; 상기 웹 페이지에 포함된 이미지 내 글자를 기초로 텍스트를 추출하는 단계; 상기 추출된 텍스트에 대한 자연어 처리를 수행하여 상기 추출된 텍스트의 일부를 유효 텍스트로 추출하는 단계; 상기 추출된 유효 텍스트의 중요도를 산출하고, 산출된 중요도에 따라서 상기 추출된 유효 텍스트의 일부를 핵심 키워드로 결정하는 단계; 및 상기 결정된 핵심 키워드에 기반하여 상기 메타데이터를 생성하는 단계를 포함한다.A method for generating metadata based on a web page according to an embodiment of the present invention is a method for generating metadata based on a web page by a metadata generating application running on a processor of a computing device, comprising extracting text included in the web page. step; extracting text based on the characters in the image included in the web page; extracting a portion of the extracted text as valid text by performing natural language processing on the extracted text; calculating the importance of the extracted valid text, and determining a part of the extracted valid text as a key keyword according to the calculated importance; and generating the metadata based on the determined key keyword.

이때, 상기 웹 페이지에 포함된 이미지 내 글자를 기초로 텍스트를 추출하는 단계는, 상기 웹 페이지 내 포함된 이미지를 추출하는 단계와, 상기 추출된 이미지 내에 글자를 추출하는 단계와, 상기 추출된 글자를 텍스트로 변환하는 단계를 포함한다. In this case, the step of extracting text based on the characters in the image included in the web page includes the steps of extracting the image included in the web page, extracting the characters in the extracted image, and the extracted characters converting to text.

또한, 상기 추출된 텍스트의 일부를 유효 텍스트를 추출하는 단계는, 상기 추출된 텍스트에서 불용어를 제거하는 단계와, 상기 추출된 텍스트에서 불용문장을 제거하는 단계를 포함한다. In addition, the step of extracting the valid text from the part of the extracted text includes removing stopwords from the extracted text and removing stopwords from the extracted text.

또한, 상기 추출된 유효 텍스트의 일부를 핵심 키워드로 결정하는 단계는, 상기 유효 텍스트를 단어단위로 분리하는 단계와, 상기 분리된 단어들에 대한 중요도를 산출하는 단계와, 상기 중요도가 높은 순서에 따라서 선정된 핵심단어를 상기 핵심 키워드로 결정하는 단계를 포함한다. In addition, the step of determining a part of the extracted valid text as a key keyword includes the steps of separating the valid text into word units, calculating the importance of the separated words, in the order of increasing importance. Accordingly, it includes the step of determining the selected key word as the key keyword.

또한, 상기 추출된 유효 텍스트의 일부를 핵심 키워드로 결정하는 단계는, 상기 유효 텍스트를 문장단위로 분리하는 단계와, 상기 분리된 문장들에 대한 중요도를 산출하는 단계와, 상기 중요도가 높은 순서에 따라서 선정된 핵심문장을 상기 핵심 키워드로 결정하는 단계를 포함한다. In addition, the step of determining a part of the extracted valid text as a key keyword includes the steps of dividing the valid text into sentence units, calculating the importance of the separated sentences, and in the order of the highest importance. Therefore, it includes the step of determining the selected key sentence as the key keyword.

또한, 상기 메타데이터는, 상기 웹 페이지의 제목으로 이용되는 제목 섹션에 포함되는 제목 메타데이터 개체와, 상기 웹 페이지에 대한 설명으로 이용되는 설명 섹션의 설명 메타데이터 개체와, 상기 웹 페이지의 태그어로 이용되는 태그어 섹션의 태그어 메타데이터 개체를 포함한다. In addition, the metadata includes a title metadata object included in a title section used as a title of the web page, a description metadata object of a description section used as a description for the web page, and a tag word of the web page. Contains the tagword metadata object of the used tagword section.

또한, 상기 메타데이터를 생성하는 단계는, 상기 핵심문장에 기초가되는 이미지 내 글자의 위치 또는 크기에 따라서 상기 핵심문장 중 적어도 하나 이상의 핵심문장을 상기 제목 메타데이터 개체로 설정하는 단계를 포함한다. In addition, the generating of the metadata includes setting at least one core sentence among the core sentences as the title metadata object according to a position or size of a character in an image that is a basis for the core sentence.

또한, 상기 메타데이터를 생성하는 단계는, 상기 핵심문장 중 적어도 일부를 상기 설명 메타데이터 개체로 설정하는 단계를 포함한다. In addition, the generating of the metadata includes setting at least a part of the core sentences as the description metadata entity.

또한, 상기 메타데이터를 생성하는 단계는, 상기 핵심단어를 상기 중요도가 높은 순서에 따라서 제 k 개의 핵심단어를 추출하는 단계와, 상기 추출된 핵심단어를 상기 태그어 메타데이터 개체로 설정하는 단계를 포함한다. In addition, the generating of the metadata includes the steps of extracting the k-th key words according to the order of importance of the key words, and setting the extracted key words as the tag word metadata object. include

또한, 본 발명의 실시예에 따른 웹 페이지에 기반한 메타데이터 생성방법은, 기 생성된 메타데이터에 대한 사용자 편집 인터페이스를 제공하는 단계를 더 포함한다. In addition, the method for generating metadata based on a web page according to an embodiment of the present invention further includes providing a user editing interface for previously generated metadata.

본 발명의 실시예에 따른 웹 페이지에 기반한 메타데이터 생성방법은, 웹 페이지의 텍스트, 이미지 내 글자 및 이미지에 대한 대체 텍스트 중 어느 하나 이상을 기반으로 크롤링(crawling)을 수행하여 상기 웹 페이지에 대한 속성을 설명하는 메타데이터를 제공함으로써, 웹 페이지 내의 일반 텍스트뿐만 아니라 해당 웹 페이지가 포함하는 이미지에 대한 텍스트까지 고려한 메타데이터를 제공할 수 있다. In the method for generating metadata based on a web page according to an embodiment of the present invention, crawling is performed based on any one or more of text of a web page, text in an image, and alternative text for an image to By providing metadata describing the attribute, it is possible to provide metadata in consideration of not only plain text in a web page but also text about an image included in the web page.

또한, 본 발명의 실시예에 따른 웹 페이지에 기반한 메타데이터 생성방법은, 웹 페이지 내의 일반 텍스트뿐만 아니라 해당 웹 페이지가 포함하는 이미지에 대한 텍스트까지 고려한 메타데이터를 제공함으로써, 제공되는 메타데이터의 정확성과 신뢰성을 보다 향상시킬 수 있는 효과가 있다. In addition, the method for generating metadata based on a web page according to an embodiment of the present invention provides metadata that considers not only plain text in a web page but also text for an image included in the web page, thereby providing metadata accuracy and reliability can be further improved.

또한, 본 발명의 실시예에 따른 웹 페이지에 기반한 메타데이터 생성방법은, 웹 페이지로부터 크롤링된 텍스트의 속성을 기반으로 해당 웹 페이지에 대한 메타데이터를 생성함으로써, 생성되는 메타데이터의 퀄리티를 증진시킬 수 있는 효과가 있다. In addition, the method for generating metadata based on a web page according to an embodiment of the present invention can improve the quality of generated metadata by generating metadata for a corresponding web page based on the properties of text crawled from the web page. can have an effect.

또한, 본 발명의 실시예에 따른 웹 페이지에 기반한 메타데이터 생성방법은, 웹 페이지에 대한 메타데이터를 자동으로 생성하고, 자동 생성된 메타데이터에 대한 편집 인터페이스를 제공함으로써, 자주 업데이트가 일어나며 개인이 관리하기 어려운 메타데이터를 손 쉽게 정의하고 관리할 수 있는 효과가 있다. In addition, the method for generating metadata based on a web page according to an embodiment of the present invention automatically generates metadata for a web page and provides an editing interface for the automatically generated metadata, so that frequent updates occur and individuals can It has the effect of easily defining and managing metadata that is difficult to manage.

다만, 본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 명확하게 이해될 수 있다. However, the effects obtainable in the present invention are not limited to the above-mentioned effects, and other effects not mentioned can be clearly understood from the description below.

도 1은 본 발명의 실시예에 따른 웹 페이지에 기반한 메타데이터 생성 시스템의 개념도이다.
도 2는 본 발명의 실시예에 따른 모바일 타입 컴퓨팅 디바이스의 내부 블록도이다.
도 3은 본 발명의 실시예에 따른 데스크탑 타입 컴퓨팅 디바이스의 내부 블록도이다.
도 4는 본 발명의 실시예에 따른 웹 페이지에 기반한 메타데이터 생성방법을 설명하기 위한 흐름도이다.
도 5는 본 발명의 실시예에 따른 이미지를 포함하는 웹 페이지를 나타내는 모습의 일례이다.
도 6은 본 발명의 실시예에 따른 이미지를 포함하는 웹 페이지에 기반하여 메타데이터를 생성하는 방법을 설명하기 위한 도면이다.
도 7은 본 발명의 실시예에 따른 메타데이터에 대한 편집 인터페이스를 제공하는 모습의 일례이다.
도 8은 본 발명의 실시예에 따른 메타데이터에 기초하여 웹 페이지 검색호출 화면을 제공하는 모습의 일례이다.
도 9는 본 발명의 실시예에 따른 속성 적합도 인터페이스를 나타내는 모습의 일례이다.
도 10은 본 발명의 실시예에 따른 프리뷰(preview) 인터페이스를 나타내는 모습의 일례이다. 1 is a conceptual diagram of a system for generating metadata based on a web page according to an embodiment of the present invention.
2 is an internal block diagram of a mobile type computing device according to an embodiment of the present invention.
3 is an internal block diagram of a desktop type computing device according to an embodiment of the present invention.
4 is a flowchart illustrating a method of generating metadata based on a web page according to an embodiment of the present invention.
5 is an example of a state of showing a web page including an image according to an embodiment of the present invention.
6 is a diagram for explaining a method of generating metadata based on a web page including an image according to an embodiment of the present invention.
7 is an example of providing an editing interface for metadata according to an embodiment of the present invention.
8 is an example of providing a web page search call screen based on metadata according to an embodiment of the present invention.
9 is an example of a state showing an attribute suitability interface according to an embodiment of the present invention.
10 is an example showing a preview interface according to an embodiment of the present invention.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 본 발명의 효과 및 특징, 그리고 그것들을 달성하는 방법은 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 다양한 형태로 구현될 수 있다. 이하의 실시예에서, 제1, 제2 등의 용어는 한정적인 의미가 아니라 하나의 구성 요소를 다른 구성 요소와 구별하는 목적으로 사용되었다. 또한, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 또한, 포함하다 또는 가지다 등의 용어는 명세서상에 기재된 특징, 또는 구성요소가 존재함을 의미하는 것이고, 하나 이상의 다른 특징들 또는 구성요소가 부가될 가능성을 미리 배제하는 것은 아니다. 또한, 도면에서는 설명의 편의를 위하여 구성 요소들이 그 크기가 과장 또는 축소될 수 있다. 예컨대, 도면에서 나타난 각 구성의 크기 및 두께는 설명의 편의를 위해 임의로 나타내었으므로, 본 발명이 반드시 도시된 바에 한정되지 않는다.Since the present invention can apply various transformations and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail in the detailed description. Effects and features of the present invention, and a method of achieving them, will become apparent with reference to the embodiments described below in detail in conjunction with the drawings. However, the present invention is not limited to the embodiments disclosed below and may be implemented in various forms. In the following embodiments, terms such as first, second, etc. are used for the purpose of distinguishing one component from another, not in a limiting sense. Also, the singular expression includes the plural expression unless the context clearly dictates otherwise. In addition, terms such as include or have means that the features or components described in the specification are present, and do not preclude the possibility that one or more other features or components will be added. In addition, in the drawings, the size of the components may be exaggerated or reduced for convenience of description. For example, since the size and thickness of each component shown in the drawings are arbitrarily indicated for convenience of description, the present invention is not necessarily limited to the illustrated bar.

이하, 첨부된 도면을 참조하여 본 발명의 실시예들을 상세히 설명하기로 하며, 도면을 참조하여 설명할 때 동일하거나 대응하는 구성 요소는 동일한 도면부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, and when described with reference to the drawings, the same or corresponding components are given the same reference numerals, and the overlapping description thereof will be omitted. .

도 1은 본 발명의 실시예에 따른 웹 페이지에 기반한 메타데이터 생성 시스템의 개념도이다. 1 is a conceptual diagram of a system for generating metadata based on a web page according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 웹 페이지에 기반한 메타데이터 생성 시스템은, 컴퓨팅 디바이스(100), 메타데이터 서버(400) 및 네트워크(500)를 포함할 수 있다. Referring to FIG. 1 , a system for generating metadata based on a web page according to an embodiment of the present invention may include a computing device 100 , a metadata server 400 , and a network 500 .

실시예에서, 컴퓨팅 디바이스(100), 메타데이터 서버(400) 및 네트워크(500)는, 상호 연동하여 웹 페이지에 대한 속성을 설명하는 메타데이터를 해당 웹 페이지 내의 이미지와 텍스트를 기반으로 생성하는 메타데이터 생성 어플리케이션을 구현할 수 있다. In an embodiment, the computing device 100 , the metadata server 400 , and the network 500 interact with each other to generate metadata describing properties of a web page based on images and text in the web page. Data generation applications can be implemented.

자세히, 본 발명의 실시예에서 메타데이터 생성 어플리케이션(이하, 메타 어플리케이션)은, 웹 페이지에 접속하여 해당 웹 페이지 내의 텍스트를 추출할 수 있다. In detail, in an embodiment of the present invention, the metadata generating application (hereinafter referred to as the meta application) may access a web page and extract text in the corresponding web page.

또한, 실시예에서 메타 어플리케이션은, 추출된 텍스트에 대한 자연어 처리를 수행하여, 인터넷 검색 시 검색 색인으로 의미있는 텍스트인 유효 텍스트를 도출할 수 있다. In addition, in an embodiment, the meta application may perform natural language processing on the extracted text to derive valid text that is a meaningful text as a search index when searching the Internet.

또한, 메타 어플리케이션은, 도출된 유효 텍스트에 기반하여 해당 웹 페이지를 요약할 수 있고, 이를 통해 상기 웹 페이지에 대한 핵심 키워드를 도출할 수 있다. In addition, the meta application may summarize the corresponding web page based on the derived valid text, and through this, may derive a key keyword for the web page.

이때, 핵심 키워드란, 자연어 처리를 통해 획득된 유효 텍스트 중, 웹 페이지 상에서의 중요도가 소정의 기준 이상으로 높다고 판단된 텍스트일 수 있다. 이에 대한 자세한 설명은 후술하기로 한다. In this case, the core keyword may be a text determined to have a higher importance on a web page than a predetermined criterion among valid texts obtained through natural language processing. A detailed description thereof will be provided later.

또한, 실시예에서 메타 어플리케이션은, 도출된 핵심 키워드에 기반하여 메타데이터를 생성할 수 있고, 생성된 메타데이터에 대한 편집 인터페이스를 제공할 수 있다. Also, in an embodiment, the meta application may generate metadata based on the derived core keyword and may provide an editing interface for the generated metadata.

또한, 메타 어플리케이션은, 생성된 메타데이터에 기초하여 해당 웹 페이지에 대한 웹 페이지 검색호출 화면을 제공할 수 있다. Also, the meta application may provide a web page search call screen for a corresponding web page based on the generated metadata.

여기서, 웹 페이지 검색호출 화면이란, 검색엔진 상에서 메타데이터에 기초한 웹 페이지 검색이 수행된 경우, 검색결과로 출력되는 화면에 표시되는 상기 웹 페이지에 대한 정보 제공 인터페이스일 수 있다. 자세한 설명은 이하에서 후술하기로 한다. Here, the web page search call screen may be an information providing interface for the web page displayed on a screen output as a search result when a web page search based on metadata is performed on a search engine. A detailed description will be given below.

한편, 도 1의 컴퓨팅 디바이스(100) 및 메타데이터 서버(400)는, 네트워크(500)를 통하여 연결될 수 있다. Meanwhile, the computing device 100 and the metadata server 400 of FIG. 1 may be connected through the network 500 .

여기서, 네트워크(500)는, 컴퓨팅 디바이스(100) 및 메타데이터 서버(400) 등과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크(500)의 일 예에는 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 블루투스(Bluetooth) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다.Here, the network 500 means a connection structure capable of exchanging information between each node, such as the computing device 100 and the metadata server 400 , and an example of such a network 500 includes 3GPP (3rd Generation Partnership Project network, LTE (Long Term Evolution) network, WIMAX (World Interoperability for Microwave Access) network, Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network) ), PAN (Personal Area Network), Bluetooth (Bluetooth) network, satellite broadcasting network, analog broadcasting network, DMB (Digital Multimedia Broadcasting) network, etc., but are not limited thereto.

- 컴퓨팅 디바이스(Computing device: 100) - Computing device (100)

본 발명의 실시예에서 컴퓨팅 디바이스(100)는, 웹 페이지에 대한 속성을 설명하는 메타데이터를 해당 웹 페이지 내의 이미지와 텍스트를 기반으로 생성하는 메타 어플리케이션을 실행할 수 있다. In an embodiment of the present invention, the computing device 100 may execute a meta application that generates metadata describing properties of a web page based on images and text in the web page.

또한, 실시예에서 컴퓨팅 디바이스(100)는, 메타 어플리케이션이 설치된 다양한 타입(예컨대, 모바일 타입 또는 데스크탑 타입)의 컴퓨팅 디바이스(100)를 포함할 수 있다. Also, in an embodiment, the computing device 100 may include various types of computing devices 100 (eg, a mobile type or a desktop type) on which meta-applications are installed.

1. 모바일 타입 컴퓨팅 디바이스1. Mobile Type Computing Device

본 발명의 실시예에서 모바일 타입 컴퓨팅 디바이스(200)는, 메타 어플리케이션이 설치된 스마트 폰이나 테블릿 PC와 같은 모바일 장치일 수 있다. In an embodiment of the present invention, the mobile type computing device 200 may be a mobile device such as a smart phone or a tablet PC on which a meta application is installed.

예를 들어, 모바일 타입 컴퓨팅 디바이스(200)는, 스마트 폰(smart phone), 휴대폰, 디지털방송용 단말기, PDA(personal digital assistants), PMP(portable multimedia player), 태블릿 PC(tablet PC) 등이 포함될 수 있다. For example, the mobile type computing device 200 may include a smart phone, a mobile phone, a digital broadcasting terminal, personal digital assistants (PDA), a portable multimedia player (PMP), a tablet PC, and the like. have.

도 2는 본 발명의 실시예에 따른 모바일 타입 컴퓨팅 디바이스(200)의 내부 블록도이다. 2 is an internal block diagram of a mobile type computing device 200 according to an embodiment of the present invention.

도 2를 참조하면, 예시적인 구현에 따른 모바일 타입 컴퓨팅 디바이스(200)는, 메모리(210), 프로세서 어셈블리(220), 통신 모듈(230), 인터페이스 모듈(240), 입력 시스템(250), 센서 시스템(260) 및 디스플레이 시스템(270)을 포함할 수 있다. 이러한 구성요소들은 모바일 타입 컴퓨팅 디바이스(200)의 하우징 내에 포함되도록 구성될 수 있다. Referring to FIG. 2 , a mobile type computing device 200 according to an exemplary implementation includes a memory 210 , a processor assembly 220 , a communication module 230 , an interface module 240 , an input system 250 , and a sensor. may include a system 260 and a display system 270 . These components may be configured to be included within the housing of the mobile type computing device 200 .

자세히, 메모리(210)에는, 메타 어플리케이션(211)이 저장되며, 메타 어플리케이션(211)에는 웹 페이지 기반의 메타데이터 생성 서비스를 제공하기 위한 각종 응용 프로그램, 데이터 및 명령어 중 어느 하나 이상을 저장할 수 있다. In detail, the memory 210 stores the meta application 211, and the meta application 211 may store any one or more of various application programs, data, and commands for providing a web page-based metadata generation service. .

예를 들면, 메모리(210)는, 웹 페이지 기반의 메타데이터 생성 서비스를 위한 웹 페이지 식별 데이터, 텍스트 버퍼, 이미지 버퍼, 위치 엔진, 디스플레이 엔진 등이 포함될 수 있다. For example, the memory 210 may include web page identification data, a text buffer, an image buffer, a location engine, a display engine, and the like for a web page-based metadata generation service.

즉, 메모리(210)는 웹 페이지 기반의 메타데이터 생성 서비스 환경을 제공하기 위해 사용될 수 있는 명령 및 데이터를 저장할 수 있다. That is, the memory 210 may store commands and data that may be used to provide a web page-based metadata generation service environment.

또한, 메모리(210)는, 적어도 하나 이상의 비일시적 컴퓨터 판독 가능 저장매체와, 일시적 컴퓨터 판독 가능 저장매체를 포함할 수 있다. 예를 들어, 메모리(210)는, ROM, EPROM, 플래시 드라이브, 하드 드라이브 등과 같은 다양한 저장기기일 수 있고, 인터넷(internet)상에서 상기 메모리(210)의 저장 기능을 수행하는 웹 스토리지(web storage)를 포함할 수 있다. In addition, the memory 210 may include at least one or more non-transitory computer-readable storage media and a temporary computer-readable storage medium. For example, the memory 210 may be various storage devices such as ROM, EPROM, flash drive, hard drive, and the like, and a web storage that performs a storage function of the memory 210 on the Internet. may include.

프로세서 어셈블리(220)는, 웹 페이지 기반의 메타데이터 생성 서비스를 제공하기 위한 다양한 작업을 수행하기 위해, 메모리(210)에 저장된 메타 어플리케이션(211)의 명령들을 실행할 수 있는 적어도 하나 이상의 프로세서를 포함할 수 있다. The processor assembly 220 may include at least one processor capable of executing instructions of the meta application 211 stored in the memory 210 to perform various tasks for providing a web page-based metadata generation service. can

실시예에서 프로세서 어셈블리(220)는, 웹 페이지 기반의 메타데이터 생성 서비스를 제공하기 위하여 메모리(210)의 메타 어플리케이션(211)을 통해 구성요소의 전반적인 동작을 컨트롤할 수 있다. In an embodiment, the processor assembly 220 may control overall operations of components through the meta application 211 of the memory 210 to provide a web page-based metadata generation service.

이러한 프로세서 어셈블리(220)는, 중앙처리장치(CPU) 및/또는 그래픽 프로세서 장치(GPU)를 포함할 수 있다. 또한, 프로세서 어셈블리(220)는, ASICs (application specific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세스(microprocessors), 기타 기능 수행을 위한 전기적 유닛 중 적어도 하나를 포함하여 구현될 수 있다. The processor assembly 220 may include a central processing unit (CPU) and/or a graphics processor unit (GPU). In addition, the processor assembly 220, ASICs (application specific integrated circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices), FPGAs (field programmable gate arrays), controllers (controllers) ), micro-controllers, microprocessors, and other electrical units for performing other functions.

통신 모듈(230)은, 다른 컴퓨팅 장치(예컨대, 메타데이터 서버(400))와 통신하기 위한 하나 이상의 장치를 포함할 수 있다. 이러한 통신 모듈(230)은, 무선 네트워크를 통해 통신할 수 있다. Communication module 230 may include one or more devices for communicating with other computing devices (eg, metadata server 400 ). The communication module 230 may communicate through a wireless network.

자세히, 통신 모듈(230)은, 웹 페이지 기반의 메타데이터 생성 서비스 환경을 구현하기 위한 컨텐츠 소스를 저장한 컴퓨팅 장치와 통신할 수 있으며, 사용자 입력을 받은 컨트롤러와 같은 다양한 사용자 입력 컴포넌트와 통신할 수 있다. In detail, the communication module 230 may communicate with a computing device storing a content source for implementing a web page-based metadata generation service environment, and may communicate with various user input components such as a controller receiving user input. have.

실시예에서 통신 모듈(230)은, 웹 페이지 기반의 메타데이터 생성 서비스와 관련된 각종 데이터를 메타데이터 서버(400) 및/또는 다른 컴퓨팅 디바이스(100)와 송수신할 수 있다. In an embodiment, the communication module 230 may transmit/receive various data related to a web page-based metadata generation service with the metadata server 400 and/or other computing device 100 .

이러한 통신 모듈(230)은, 이동통신을 위한 기술표준들 또는 통신방식(예를 들어, LTE(Long Term Evolution), LTE-A(Long Term Evolution-Advanced),5G NR(New Radio), WIFI) 또는 근거리 통신방식 등을 수행할 수 있는 통신장치를 통해 구축된 이동 통신망 상에서 기지국, 외부의 단말, 임의의 서버 중 적어도 하나와 무선으로 데이터를 송수신할 수 있다.Such a communication module 230, the technical standards or communication methods for mobile communication (eg, Long Term Evolution (LTE), Long Term Evolution-Advanced (LTE-A), 5G NR (New Radio), WIFI) Alternatively, data may be wirelessly transmitted/received with at least one of a base station, an external terminal, and an arbitrary server on a mobile communication network constructed through a communication device capable of performing a short-range communication method or the like.

센서 시스템(260)은, 이미지 센서(261), 위치 센서(IMU, 263), 오디오 센서, 거리 센서, 근접 센서, 접촉 센서 등 다양한 센서를 포함할 수 있다. The sensor system 260 may include various sensors such as an image sensor 261 , a position sensor (IMU) 263 , an audio sensor, a distance sensor, a proximity sensor, and a contact sensor.

이미지 센서(261)는, 모바일 타입 컴퓨팅 디바이스(200) 주위의 물리적 공간에 대한 이미지 및/또는 영상을 캡처할 수 있다. The image sensor 261 may capture images and/or images of the physical space around the mobile type computing device 200 .

실시예에서 이미지 센서(261)는, 웹 페이지 기반의 메타데이터 생성 서비스에 관련된 영상(예컨대, 웹 페이지에 포함되는 이미지 등)을 촬영하여 획득할 수 있다.In an embodiment, the image sensor 261 may capture and acquire an image (eg, an image included in a web page, etc.) related to a web page-based metadata generating service.

또한, 이미지 센서(261)는, 모바일 타입 컴퓨팅 디바이스(200)의 전면 또는/및 후면에 배치되어 배치된 방향측을 촬영하여 영상을 획득할 수 있으며, 모바일 타입 컴퓨팅 디바이스(200)의 외부를 향해 배치된 카메라를 통해 물리적 공간을 촬영할 수 있다. In addition, the image sensor 261 may be disposed on the front or / and rear of the mobile type computing device 200 to acquire an image by photographing the disposed direction side, and face the outside of the mobile type computing device 200 . Physical space can be photographed through the placed camera.

이러한 이미지 센서(261)는, 이미지 센서장치와 영상 처리 모듈을 포함할 수 있다. 자세히, 이미지 센서(261)는, 이미지 센서장치(예를 들면, CMOS 또는 CCD)에 의해 얻어지는 정지영상 또는 동영상을 처리할 수 있다. The image sensor 261 may include an image sensor device and an image processing module. Specifically, the image sensor 261 may process a still image or a moving image obtained by an image sensor device (eg, CMOS or CCD).

또한, 이미지 센서(261)는, 영상 처리 모듈을 이용하여 이미지 센서장치를 통해 획득된 정지영상 또는 동영상을 가공해 필요한 정보를 추출하고, 추출된 정보를 프로세서에 전달할 수 있다.Also, the image sensor 261 may process a still image or a moving image obtained through the image sensor device using an image processing module to extract necessary information, and transmit the extracted information to the processor.

이러한 이미지 센서(261)는, 적어도 하나 이상의 카메라를 포함하는 카메라 어셈블리일 수 있다. 카메라 어셈블리는, 가시광선 대역을 촬영하는 일반 카메라를 포함할 수 있으며, 적외선 카메라, 스테레오 카메라 등의 특수 카메라를 더 포함할 수 있다. The image sensor 261 may be a camera assembly including at least one camera. The camera assembly may include a general camera that captures a visible light band, and may further include a special camera such as an infrared camera or a stereo camera.

IMU(263)는 모바일 타입 컴퓨팅 디바이스(200)의 움직임 및 가속도 중 적어도 하나 이상을 감지할 수 있다. 예를 들어, 가속도계, 자이로스코프, 자력계와 같은 다양한 위치 센서의 조합으로 이루어 질 수 있다. 또한, 통신 모듈(230)의 GPS와 같은 위치 통신 모듈(230)과 연동하여, 모바일 타입 컴퓨팅 디바이스(200) 주변의 물리적 공간에 대한 공간 정보를 인식할 수 있다. The IMU 263 may sense at least one of motion and acceleration of the mobile type computing device 200 . For example, it may consist of a combination of various position sensors such as an accelerometer, a gyroscope, and a magnetometer. In addition, by interworking with the location communication module 230 such as GPS of the communication module 230 , spatial information about the physical space around the mobile type computing device 200 may be recognized.

또한, IMU(263)는, 검출된 위치 및 방향을 기초로 사용자의 시선 방향 및 머리 움직임을 검출 및 추적하는 정보를 검출할 수 있다. Also, the IMU 263 may detect information for detecting and tracking the user's gaze direction and head movement based on the detected position and direction.

또한, 일부 구현들에서, 메타 어플리케이션(211)은 이러한 IMU(263) 및 이미지 센서(261)를 사용하여 물리적 공간 내의 사용자의 위치 및 방향을 결정하거나 물리적 공간 내의 특징 또는 객체를 인식할 수 있다. Further, in some implementations, the meta-application 211 may use such an IMU 263 and an image sensor 261 to determine a user's location and orientation within the physical space or to recognize a feature or object within the physical space.

오디오 센서(265)는, 모바일 타입 컴퓨팅 디바이스(200) 주변의 소리를 인식할 수 있다. The audio sensor 265 may recognize a sound around the mobile type computing device 200 .

자세히, 오디오 센서(265)는, 모바일 타입 컴퓨팅 디바이스(200) 사용자의 음성 입력을 감지할 수 있는 마이크로폰을 포함할 수 있다. Specifically, the audio sensor 265 may include a microphone capable of detecting a voice input of a user of the mobile type computing device 200 .

실시예에서 오디오 센서(265)는 웹 페이지 기반의 메타데이터 생성 서비스를 위해 필요한 음성 데이터를 사용자로부터 입력 받을 수 있다.In an embodiment, the audio sensor 265 may receive voice data required for a web page-based metadata generation service from a user.

인터페이스 모듈(240)은, 모바일 타입 컴퓨팅 디바이스(200)를 하나 이상의 다른 장치와 통신 가능하게 연결할 수 있다. 자세히, 인터페이스 모듈(240)은, 하나 이상의 상이한 통신 프로토콜과 호환되는 유선 및/또는 무선 통신 장치를 포함할 수 있다. The interface module 240 may communicatively connect the mobile type computing device 200 with one or more other devices. Specifically, the interface module 240 may include wired and/or wireless communication devices that are compatible with one or more different communication protocols.

이러한 인터페이스 모듈(240)을 통해 모바일 타입 컴퓨팅 디바이스(200)는, 여러 입출력 장치들과 연결될 수 있다. The mobile type computing device 200 may be connected to various input/output devices through the interface module 240 .

예를 들어, 인터페이스 모듈(240)은, 헤드셋 포트나 스피커와 같은 오디오 출력장치와 연결되어, 오디오를 출력할 수 있다. For example, the interface module 240 may be connected to an audio output device such as a headset port or a speaker to output audio.

예시적으로 오디오 출력장치가 인터페이스 모듈(240)을 통해 연결되는 것으로 설명하였으나, 모바일 타입 컴퓨팅 디바이스(200) 내부에 설치되는 실시예도 포함될 수 있다. Although it has been described as an example that the audio output device is connected through the interface module 240 , an embodiment in which the audio output device is installed inside the mobile type computing device 200 may also be included.

이러한 인터페이스 모듈(240)은, 유/무선 헤드셋 포트(port), 외부 충전기 포트(port), 유/무선 데이터 포트(port), 메모리 카드(memory card) 포트, 식별 모듈이 구비된 장치를 연결하는 포트(port), 오디오 I/O(Input/Output) 포트(port), 비디오 I/O(Input/Output) 포트(port), 이어폰 포트(port), 전력 증폭기, RF 회로, 송수신기 및 기타 통신 회로 중 적어도 하나를 포함하여 구성될 수 있다. This interface module 240, a wired / wireless headset port (port), an external charger port (port), a wired / wireless data port (port), a memory card (memory card) port, for connecting a device equipped with an identification module Ports, audio Input/Output (I/O) ports, video I/O (Input/Output) ports, earphone ports, power amplifiers, RF circuits, transceivers and other communication circuits It may be configured to include at least one of.

입력 시스템(250)은 웹 페이지 기반의 메타데이터 생성 서비스와 관련된 사용자의 입력(예를 들어, 제스처, 음성 명령, 버튼의 작동 또는 다른 유형의 입력)을 감지할 수 있다. The input system 250 may detect a user's input (eg, a gesture, a voice command, an operation of a button, or other type of input) related to the web page-based metadata generation service.

자세히, 입력 시스템(250)은 버튼, 터치 센서 및 사용자 모션 입력을 수신하는 이미지 센서(261)를 포함할 수 있다. Specifically, the input system 250 may include a button, a touch sensor, and an image sensor 261 that receives user motion input.

또한, 입력 시스템(250)은, 인터페이스 모듈(240)을 통해 외부 컨트롤러와 연결되어, 사용자의 입력을 수신할 수 있다. Also, the input system 250 may be connected to an external controller through the interface module 240 to receive a user's input.

디스플레이 시스템(270)은, 웹 페이지 기반의 메타데이터 생성 서비스와 관련된 다양한 정보를 그래픽 이미지로 출력할 수 있다. The display system 270 may output various information related to a web page-based metadata generation service as a graphic image.

실시예에서, 디스플레이 시스템(270)은, 웹 페이지가 포함하는 텍스트 및/또는 이미지 등을 그래픽 이미지로 표시할 수 있다. In an embodiment, the display system 270 may display text and/or images included in a web page as graphic images.

이러한 디스플레이는, 액정 디스플레이(liquid crystal display, LCD), 박막 트랜지스터 액정 디스플레이(thin film transistor-liquid crystal display, TFT LCD), 유기 발광 다이오드(organic light-emitting diode, OLED), 플렉서블 디스플레이(flexible display), 3차원 디스플레이(3D display), 전자잉크 디스플레이(e-ink display) 중에서 적어도 하나를 포함할 수 있다.Such displays include a liquid crystal display (LCD), a thin film transistor-liquid crystal display (TFT LCD), an organic light-emitting diode (OLED), and a flexible display. , a three-dimensional display (3D display), may include at least one of the electronic ink display (e-ink display).

이러한 모바일 타입 컴퓨팅 디바이스(200)의 하우징 내에는 상기 구성요소들이 배치될 수 있으며, 사용자 인터페이스는 사용자 터치 입력을 수신하도록 구성된 디스플레이(271) 상에 터치 센서(273)를 포함할 수 있다. The above components may be disposed within the housing of this mobile type computing device 200 , and the user interface may include a touch sensor 273 on a display 271 configured to receive user touch input.

자세히, 디스플레이 시스템(270)은, 이미지를 출력하는 디스플레이(271)와, 사용자의 터치 입력을 감지하는 터치 센서(273)를 포함할 수 있다.In detail, the display system 270 may include a display 271 that outputs an image and a touch sensor 273 that detects a user's touch input.

예시적으로 디스플레이(271)는 터치 센서(273)와 상호 레이어 구조를 이루거나 일체형으로 형성됨으로써, 터치 스크린으로 구현될 수 있다. 이러한 터치 스크린은, 모바일 타입 컴퓨팅 디바이스(200)와 사용자 사이의 입력 인터페이스를 제공하는 사용자 입력부로써 기능함과 동시에, 모바일 타입 컴퓨팅 디바이스(200)와 사용자 사이의 출력 인터페이스를 제공할 수 있다.Exemplarily, the display 271 may be implemented as a touch screen by forming a layer structure with the touch sensor 273 or being integrally formed therewith. Such a touch screen may function as a user input unit providing an input interface between the mobile type computing device 200 and the user, and may provide an output interface between the mobile type computing device 200 and the user.

2. 데스크탑 타입 컴퓨팅 디바이스2. Desktop type computing device

도 3은 본 발명의 실시예에 따른 데스크탑 타입 컴퓨팅 디바이스(300)의 내부 블록도이다. 3 is an internal block diagram of a desktop type computing device 300 according to an embodiment of the present invention.

데스크탑 타입 컴퓨팅 디바이스(300)의 상기 구성요소에 대한 설명 중 중복되는 내용은 모바일 타입 컴퓨팅 디바이스(200)의 구성요소에 대한 설명으로 대체하기로 하며, 이하에서는 모바일 타입 컴퓨팅 디바이스(200)와의 차이점을 중심으로 설명한다.In the description of the components of the desktop type computing device 300, the overlapping contents will be replaced with the description of the components of the mobile type computing device 200. Hereinafter, the differences with the mobile type computing device 200 will be described. explained in the center.

도 3을 참조하면, 다른 예시에서 데스크탑 타입 컴퓨팅 디바이스(300)는, 메타 어플리케이션(311)이 설치된 고정형 데스크탑 PC, 노트북 컴퓨터(laptop computer), 울트라북(ultrabook)과 같은 퍼스널 컴퓨터 등과 같이 유/무선 통신을 기반으로 웹 페이지에 기반한 메타데이터 생성 서비스를 실행하기 위한 프로그램이 설치된 장치를 더 포함할 수 있다.Referring to FIG. 3 , in another example, the desktop type computing device 300 is wired/wireless, such as a fixed desktop PC, a laptop computer, and a personal computer such as an ultrabook in which the meta application 311 is installed. The apparatus may further include a device installed with a program for executing a metadata generation service based on a web page based on communication.

또한, 데스크탑 타입 컴퓨팅 디바이스(300)는, 유저 인터페이스 시스템(350)을 포함하여, 사용자 입력(예컨대, 터치 입력, 마우스 입력, 키보드 입력, 제스처 입력, 가이드 도구를 이용한 모션 입력 등)을 수신할 수 있다. In addition, the desktop type computing device 300 may include the user interface system 350 to receive a user input (eg, a touch input, a mouse input, a keyboard input, a gesture input, a motion input using a guide tool, etc.). have.

예시적으로, 데스크탑 타입 컴퓨팅 디바이스(300)는, 유저 인터페이스 시스템(350)을 다양한 통신 프로토콜로 마우스(351), 키보드(352), 제스처 입력 컨트롤러, 이미지 센서(361)(예컨대, 카메라) 및 오디오 센서(365) 등 적어도 하나의 장치와 연결되어, 사용자 입력을 획득할 수 있다. Illustratively, the desktop type computing device 300 uses the user interface system 350 with various communication protocols such as a mouse 351 , a keyboard 352 , a gesture input controller, an image sensor 361 (eg, a camera), and audio It may be connected to at least one device such as the sensor 365 to obtain a user input.

또한, 데스크탑 타입 컴퓨팅 디바이스(300)는, 유저 인터페이스 시스템(350)을 통해 외부 출력 장치와 연결될 수 있으며, 예컨대, 디스플레이 장치(370), 오디오 출력 장치 등에 연결될 수 있다. In addition, the desktop type computing device 300 may be connected to an external output device through the user interface system 350 , for example, the display device 370 , an audio output device, or the like.

또한, 예시적인 구현에 따른 데스크탑 타입 컴퓨팅 디바이스(300)는, 메모리(310), 프로세서 어셈블리(320), 통신 모듈(330), 유저 인터페이스 시스템(350) 및 입력 시스템(340)을 포함할 수 있다. 이러한 구성요소들은 데스크탑 타입 컴퓨팅 디바이스(300)의 하우징 내에 포함되도록 구성될 수 있다. Further, the desktop type computing device 300 according to the example implementation may include a memory 310 , a processor assembly 320 , a communication module 330 , a user interface system 350 , and an input system 340 . . These components may be configured to be included within the housing of the desktop type computing device 300 .

다만, 본 발명의 실시예에서 도 2 및 3에 도시된 구성요소들은, 컴퓨팅 디바이스(100)를 구현하는데 있어 필수적인 것은 아니어서, 본 명세서 상에서 설명되는 컴퓨팅 디바이스(100)는 위에서 열거된 구성요소들 보다 많거나, 또는 적은 구성요소들을 가질 수 있다.However, in the embodiment of the present invention, the components shown in FIGS. 2 and 3 are not essential for implementing the computing device 100 , so the computing device 100 described in the present specification includes the components listed above. It may have more or fewer components.

- 메타데이터 서버(Metadata server: 400) - Metadata server: 400

한편, 본 발명의 실시예에서 메타데이터 서버(400)는, 웹 페이지 기반 메타데이터 생성 서비스를 제공하기 위한 일련의 프로세스를 수행할 수 있다. Meanwhile, in an embodiment of the present invention, the metadata server 400 may perform a series of processes for providing a web page-based metadata generation service.

자세히, 실시예에서 메타데이터 서버(400)는, 웹 페이지의 텍스트, 이미지 내 글자 및 이미지에 대한 대체 텍스트 중 어느 하나 이상을 기반으로 크롤링(crawling)을 수행하여, 상기 웹 페이지에 대한 속성을 설명하는 메타데이터를 생성하는 메타데이터 생성 서비스를 제공할 수 있다. In detail, in the embodiment, the metadata server 400 describes the properties of the web page by performing crawling based on any one or more of text of a web page, text in an image, and alternative text for an image. It is possible to provide a metadata generation service that generates metadata for

보다 상세히, 도 1을 더 참조하면, 위와 같은 메타데이터 서버(400)는, 메타데이터 서비스 제공서버(410), 메타데이터 생성 서버(420) 및 데이터베이스 서버(430)를 포함할 수 있다. 이때, 실시예에 따라서 상기 각 구성요소는, 메타데이터 서버(400)와는 별도의 장치로서 구현될 수도 있고, 메타데이터 서버(400)에 포함되어 구현될 수도 있다. 이하, 각 구성요소가 메타데이터 서버(400)에 포함되어 구현되는 것으로 설명하나 이에 한정되는 것은 아니다.In more detail, referring further to FIG. 1 , the above metadata server 400 may include a metadata service providing server 410 , a metadata generating server 420 , and a database server 430 . In this case, according to an embodiment, each of the components may be implemented as a device separate from the metadata server 400 , or may be implemented by being included in the metadata server 400 . Hereinafter, each component is described as being included in the metadata server 400 and implemented, but is not limited thereto.

여기서, 메타데이터 서비스 제공서버(410)는, 컴퓨팅 디바이스(100)에서 메타 어플리케이션(211, 311)이 동작할 수 있는 환경을 제공할 수 있다. Here, the metadata service providing server 410 may provide an environment in which the meta applications 211 and 311 can operate in the computing device 100 .

실시예에서, 메타데이터 서비스 제공서버(410)는, 웹 페이지에 대한 속성을 설명하는 메타데이터를 해당 웹 페이지 내의 이미지와 텍스트를 기반으로 생성하는 메타 어플리케이션(211, 311)을 구현하기 위한 응용 프로그램, 데이터 및/또는 명령어 등을 포함할 수 있다.In an embodiment, the metadata service providing server 410 is an application program for implementing the meta applications 211 and 311 that generate metadata describing properties of a web page based on images and text in the web page. , data and/or instructions.

또한, 메타데이터 생성 서버(420)는, 소정의 기준에 따라서 웹 페이지로부터 도출된 핵심 키워드에 기반한 메타데이터를 생성할 수 있다. Also, the metadata generating server 420 may generate metadata based on key keywords derived from a web page according to a predetermined criterion.

실시예에서, 메타데이터 생성 서버(420)는, 웹 페이지로부터 도출된 핵심 키워드를 획득할 수 있고, 획득된 핵심 키워드의 웹 페이지 내 위치, 폰트 형태 및/또는 웹 페이지 내 등장 빈도 수 등을 포함하는 상기 소정에 기준에 기반하여, 해당 웹 페이지에 대한 메타데이터를 생성해 제공할 수 있다. In an embodiment, the metadata generating server 420 may obtain a core keyword derived from a web page, and include the location, font form, and/or frequency of appearance in the web page of the obtained core keyword in the web page. based on the predetermined criteria, it is possible to generate and provide metadata for a corresponding web page.

또한, 데이터베이스 서버(430)는, 웹 페이지 기반 메타데이터 생성 서비스를 구현하기 위한 각종 응용 프로그램, 어플리케이션, 명령어 및/또는 데이터 등을 저장하고 관리할 수 있다. In addition, the database server 430 may store and manage various application programs, applications, commands and/or data for implementing a web page-based metadata generation service.

실시예에서, 데이터베이스 서버(430)는, 웹 페이지별 텍스트 정보, 이미지 정보(실시예로, 대체 텍스트 태그 정보 등), 유효 텍스트 정보, 핵심 키워드 정보, 메타데이터 정보 및/또는 웹 페이지 검색호출 화면 정보 등을 저장 및 관리할 수 있다. In an embodiment, the database server 430 provides text information for each web page, image information (in an embodiment, alternative text tag information, etc.), valid text information, key keyword information, metadata information, and/or a web page search call screen. You can store and manage information, etc.

한편, 위와 같은 구성요소들을 포함하는 메타데이터 서버(400)는, 적어도 하나 이상의 메타데이터 서비스 제공서버(410), 메타데이터 생성 서버(420) 및/또는 데이터베이스 서버(430)로 구성될 수 있으며, 데이터 처리를 위한 프로세서들과, 웹 페이지 기반의 메타데이터 생성 서비스 제공을 위한 명령어들을 저장하는 메모리들을 포함할 수 있다.On the other hand, the metadata server 400 including the above components may be composed of at least one or more metadata service providing server 410, metadata generating server 420 and/or database server 430, It may include processors for data processing and memories for storing instructions for providing a web page-based metadata generation service.

또한, 실시예에 따라서 메타데이터 서버(400)에서 수행하는 동작의 전체 또는 일부 기능은, 컴퓨팅 디바이스(100)에 의하여 수행될 수도 있는 등 다양한 실시예가 가능하다. 이하의 설명에서는, 상술된 메타데이터 서버(400)가 수행하는 일련의 동작을 컴퓨팅 디바이스(100)에서 수행하는 것으로 설명하나, 이에 한정되는 것은 아니다. In addition, according to embodiments, all or some functions of the operations performed by the metadata server 400 may be performed by the computing device 100 , and various embodiments are possible. In the following description, a series of operations performed by the above-described metadata server 400 will be described as being performed by the computing device 100 , but the present invention is not limited thereto.

- 웹 페이지에 기반한 메타데이터 생성방법 - Method of generating metadata based on web pages

이하, 첨부된 도면을 참조하여 본 발명의 실시예에 따른 웹 페이지 기반 메타데이터 생성방법에 대해 상세히 설명하고자 한다. 이하의 실시예에서는, 컴퓨팅 디바이스(100)를 모바일 타입 컴퓨팅 디바이스(200)에 기준하여 설명하기로 하나, 이에 한정되는 것은 아니다. Hereinafter, a method for generating web page-based metadata according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings. In the following embodiment, the computing device 100 will be described with reference to the mobile type computing device 200 , but is not limited thereto.

먼저, 본 발명의 실시예에서 모바일 타입 컴퓨팅 디바이스(200)는, 웹 페이지에 대한 속성을 설명하는 메타데이터를 해당 웹 페이지 내의 이미지와 텍스트를 기반으로 생성할 수 있는 메타 어플리케이션(211)을 실행할 수 있다. First, in an embodiment of the present invention, the mobile type computing device 200 may execute the meta application 211 capable of generating metadata describing properties of a web page based on images and text in the web page. have.

도 4는 본 발명의 실시예에 따른 웹 페이지에 기반한 메타데이터 생성방법을 설명하기 위한 흐름도이다. 4 is a flowchart illustrating a method of generating metadata based on a web page according to an embodiment of the present invention.

도 4를 참조하면, 위와 같이 모바일 타입 컴퓨팅 디바이스(100)에서 실행된 메타 어플리케이션(211)은, 온라인 상의 웹 페이지(Web page)에 접속할 수 있다. (S101) Referring to FIG. 4 , the meta application 211 executed in the mobile type computing device 100 as described above may access an online web page. (S101)

자세히, 웹 페이지란, 온라인으로 볼 수 있는 문서, 텍스트, 이미지 및/또는 비디오 등으로 구현되어 인터넷 검색을 통해 브라우저로 표시되는 인터넷 페이지일 수 있다. 이러한 웹 페이지는, 간단한 양식 또는 공백 등을 포함하며, 웹 페이지로의 접속을 유도하는 고유 URL(Uniform　Resource　Locator)을 제공할 수 있다. In detail, the web page may be an Internet page implemented as an online viewable document, text, image, and/or video and displayed by a browser through an Internet search. Such a web page may include a simple form or blank space, and may provide a unique URL (Uniform Resource Locator) for inducing access to the web page.

즉, 본 발명의 실시예에서 메타 어플리케이션(211)은, 웹 페이지별로 제공되는 고유 URL을 기반으로, 메타데이터를 생성하고자 하는 웹 페이지에 접속할 수 있다. That is, in the embodiment of the present invention, the meta application 211 may access a web page for which metadata is to be generated, based on a unique URL provided for each web page.

여기서, 실시예에 따른 메타데이터란, 웹 페이지의 속성을 함축하여 나타내는 데이터로서, 웹 페이지의 데이터를 효율적으로 이용하기 위하여 구조화한 데이터 정보일 수 있다. Here, the metadata according to the embodiment is data that implies and represents the properties of a web page, and may be data information structured in order to efficiently use data of a web page.

실시예에서, 이러한 메타데이터는, 검색엔진에서의 검색 시 활용되는 검색 색인으로 기능할 수 있다. In an embodiment, such metadata may function as a search index utilized during a search in a search engine.

또한, 실시예에 따른 메타데이터는, 복수의 메타데이터 섹션에 포함된 메타데이터 개체들로 이루어질 수 있다. Also, metadata according to an embodiment may be composed of metadata objects included in a plurality of metadata sections.

여기서, 메타데이터 섹션이란, 웹 페이지를 설명하는 속성의 정보 유형을 정의하며, 실시예에서 웹 페이지에 대한 제목 섹션, 설명 섹션 및/또는 태그어 섹션 등을 포함할 수 있다. 또한, 메타데이터 섹션 각각은, 메타데이터 섹션메타데이터 개체를 포함할 수 있다. Here, the metadata section defines an information type of an attribute describing a web page, and may include a title section, a description section, and/or a tag word section for the web page in an embodiment. In addition, each of the metadata sections may include a metadata section metadata object.

이때, 메타데이터 개체란, 웹 페이지의 데이터의 동일한 측면을 설명하는 일련의 메타데이터 요소들을 의미한다. 메타데이터 섹션In this case, the metadata object means a series of metadata elements that describe the same aspect of data of a web page. Metadata section

실시예에서, 메타데이터 개체는, 웹 페이지를 이루는 요소들로부터 이루어질 수 있다. 예를 들어, 메타데이터 개체는, 웹 페이지 데이터에 포함된 텍스트 요소, 이미지 내에 표시된 글자를 추출하여 획득된 텍스트 요소들을 포함할 수 있다. In an embodiment, the metadata object may be formed from elements constituting a web page. For example, the metadata object may include text elements included in web page data and text elements obtained by extracting characters displayed in images.

즉, 실시예에서 메타데이터 개체는, 웹 페이지 내에 텍스트 및 웹 페이지 내 이미지에 포함된 글자에 대한 텍스트 중 핵심 키워드로 추출된 텍스트 요소들을 기초로 생성될 수 있다. That is, in the embodiment, the metadata object may be generated based on text elements extracted as key keywords from texts in the web page and texts included in images in the web page.

이러한 실시예에서 메타데이터 개체는, 제목 메타데이터 섹션에 포함되는 제목 섹션, 설명 메타데이터 섹션에 포함되는 설명 섹션 및/또는 태그어 메타데이터 섹션에 포함되는 태그어 섹션 등을 포함할 수 있다. In this embodiment, the metadata object may include a title section included in the title metadata section, a description section included in the description metadata section, and/or a tag word section included in the tag word metadata section, and the like.

또한, 본 발명의 실시예에서 위와 같은 웹 페이지에 접속한 메타 어플리케이션(211)은, 접속된 웹 페이지 내의 텍스트를 추출할 수 있다. (S103) Also, in an embodiment of the present invention, the meta-application 211 accessing the above web page may extract text in the accessed web page. (S103)

자세히, 메타 어플리케이션(211)은, 웹 페이지에 포함된 텍스트 데이터(text data)를 해당 웹 페이지로부터 수신 및/또는 추출하여 획득할 수 있다. In detail, the meta application 211 may obtain text data included in a web page by receiving and/or extracting text data from the web page.

도 5는 본 발명의 실시예에 따른 이미지를 포함하는 웹 페이지를 나타내는 모습의 일례이다. 5 is an example of a state of showing a web page including an image according to an embodiment of the present invention.

다만, 도 5를 참조하면, 일반적으로 웹 페이지를 구현할 시, 이미지 형태로 가공된 텍스트를 포함하는 경우가 다수 존재하고 있다. However, referring to FIG. 5 , when a web page is generally implemented, there are many cases in which text processed in the form of an image is included.

그리하여, 도 5의 (1)과 같이 웹 페이지 전체가 이미지로 구현되거나, 또는 도 5의 (2)와 같이 웹 페이지의 적어도 일부가 이미지로 구현되는 경우가 발생하게 된다. Thus, as shown in FIG. 5 ( 1 ), the entire web page is implemented as an image, or as shown in FIG. 5 ( 2 ), at least a part of the web page is implemented as an image.

그러므로, 본 발명의 실시예에서 메타 어플리케이션(211)은, 웹 페이지에 대한 메타데이터를 보다 정확하게 생성하기 위하여, 접속된 웹 페이지 내의 텍스트를 추출할 시 해당 웹 페이지에 이미지가 존재하는 경우, 이미지 내의 글자까지 모두 고려하여 웹 페이지 내 텍스트를 추출할 수 있다. Therefore, in an embodiment of the present invention, the meta application 211 extracts text in a web page accessed to more accurately generate metadata for a web page, and when an image exists in the web page, It is possible to extract text within a web page by considering all characters.

이때, 메타 어플리케이션(211)은, 이미지 내의 글자를 추출하기 위하여 소정의 이미지 내 글자 추출 모델 및/또는 알고리즘을 이용할 수 있다. In this case, the meta-application 211 may use a character extraction model and/or algorithm in a predetermined image in order to extract characters in the image.

예를 들면, 메타 어플리케이션(211)은, 구글 클라우드의 비전 API(Google cloud vision API) 등과 같은 이미지 내 글자 추출 모델을 활용하여, 웹 페이지의 이미지 내 글자까지 추출할 수 있다. For example, the meta application 211 may extract even the characters in the image of the web page by utilizing a character extraction model in the image, such as the Google cloud vision API (Google cloud vision API).

참고적으로, 구글 클라우드의 비전 API는, REST 및 RPC API를 통해 선행 학습된 강력한 머신러닝 모델을 제공하며, 이미지에 라벨을 할당하고 사전 정의된 수백만 개의 카테고리로 빠르게 분류할 수 있다. 이러한 비전 API 모델은, 이미지 내의 객체나 인쇄 또는 필기 텍스트 등을 감지할 수 있으며, 이로부터 유용한 데이터를 추출하여 이미지에 기반한 메타데이터 생성 프로세스를 보조할 수 있다.For reference, Google Cloud's Vision API provides powerful machine learning models pre-trained via REST and RPC APIs, allowing you to label images and quickly classify them into millions of predefined categories. Such a vision API model can detect an object in an image, printed or handwritten text, etc., and extract useful data from it to assist the image-based metadata creation process.

다만, 본 실시예에서 메타 어플리케이션(211)은, 상술된 구글 클라우드의 비전 API 이외에도 이미지 내의 글자를 추출하기 위하여 종래의 공지된 다양한 수학적 알고리즘을 이용할 수 있으며, 본 발명에서는 이미지 내의 글자 추출을 수행하는 알고리즘 자체를 한정하거나 제한하지는 않는다. However, in the present embodiment, the meta-application 211 may use various conventionally known mathematical algorithms to extract characters in an image in addition to the Google Cloud vision API described above, and in the present invention, It does not limit or limit the algorithm itself.

이때, 위와 같이 웹 페이지로부터 추출된 텍스트는, 단어 및/또는 문장의 형태일 수 있다. In this case, the text extracted from the web page as above may be in the form of words and/or sentences.

이처럼, 본 발명의 실시예에서 메타 어플리케이션(211)은, 웹 페이지 내의 일반 텍스트뿐만 아니라 웹 페이지가 포함하는 이미지 내의 글자까지 추출하여 자동생성되는 메타데이터에 기초 자료로 활용함으로써, 추후 해당 웹 페이지에 대해 생성되는 메타데이터의 정확성과 신뢰성을 보다 향상시킬 수 있다. As such, in the embodiment of the present invention, the meta application 211 extracts not only the plain text in the web page but also the characters in the image included in the web page and uses it as basic data for automatically generated metadata, so that the web page later It is possible to further improve the accuracy and reliability of the metadata generated for the data.

한편, 다른 실시예에서 메타 어플리케이션(211)은, 웹 페이지의 이미지에 매칭되는 대체 텍스트 태그가 존재하는 경우, 해당 대체 텍스트 태그에 포함되는 단어 및/또는 문장을 텍스트로 추출할 수 있다. Meanwhile, in another embodiment, when an alternative text tag matching an image of a web page exists, the meta application 211 may extract words and/or sentences included in the corresponding alternative text tag as text.

여기서, 대체 텍스트 태그란, 이미지에 대하여 기설정된 주요 단어 및/또는 문장이 해당 이미지에 매칭되어 저장된 정보일 수 있다. Here, the alternative text tag may be information stored by matching a predetermined main word and/or sentence to the image.

예를 들어, 메타 어플리케이션(211)은, 웹 페이지의 제 1 이미지에 매칭되어 '이 문장이 제 1 이미지에 대한 핵심문장입니다.'라는 대체 텍스트 태그가 존재하는 경우, 상기 대체 텍스트 태그로 기설정되어 있는 문장을 해당 웹 페이지로부터 추출되는 텍스트에 포함시킬 수 있다. For example, if the meta application 211 matches the first image of the web page and an alternative text tag 'This sentence is a key sentence for the first image' exists, it is preset as the alternative text tag. You can include the sentences in the text extracted from the web page.

이와 같이, 실시예에서 메타 어플리케이션(211)은, 웹 페이지 내 이미지에 대한 주요 텍스트를 기설정한 정보인 대체 텍스트 태그를 활용하여 해당 웹 페이지 내의 텍스트를 추출함으로써, 메타데이터 생성 시의 효율성을 보다 증대시킬 수 있다. As described above, in the embodiment, the meta application 211 extracts the text in the corresponding web page by using the alternative text tag, which is preset information for the main text on the image in the web page, thereby increasing the efficiency of generating metadata. can be increased

또한, 본 발명의 실시예에서 메타 어플리케이션(211)은, 위와 같이 추출된 텍스트에 대한 자연어 처리(Natural Language Processing, NLP)를 수행할 수 있고, 이를 통해 유효 텍스트를 도출할 수 있다. (S105) In addition, in an embodiment of the present invention, the meta-application 211 may perform Natural Language Processing (NLP) on the extracted text as described above, and may derive valid text through this. (S105)

여기서, 실시예에 따른 자연어 처리란, 텍스트에서 의미있는 정보를 분석하여 추출하고 이해하는 일련의 기술집합에 의한 프로세스를 의미할 수 있다. Here, the natural language processing according to the embodiment may mean a process by a series of description sets for analyzing, extracting, and understanding meaningful information from text.

또한, 유효 텍스트란, 인터넷을 통한 웹 페이지 검색 시 검색 색인으로 의미있는 텍스트를 의미할 수 있다. In addition, the valid text may mean a text meaningful as a search index when a web page is searched through the Internet.

자세히, 메타 어플리케이션(211)은, 공지된 자연어 처리 기술(예컨대, 태뷸러 파싱(Tabular Parsing) 알고리즘, HMM(Hidden Markov Model) 및/또는 Bi-LSTM-CRF(Bidirectional Long Short-Term Memory with Conditional Random Fields) 등)에 기반하여, 웹 페이지로부터 추출된 텍스트에 대한 자연어 처리를 수행할 수 있고, 유효 텍스트를 설정할 수 있다. In detail, the meta-application 211 is a known natural language processing technique (eg, Tabular Parsing algorithm, Hidden Markov Model (HMM) and/or Bi-LSTM-CRF (Bidirectional Long Short-Term Memory with Conditional Random) Fields), etc.), natural language processing may be performed on the text extracted from the web page, and valid text may be set.

다만, 본 실시예에서 메타 어플리케이션(211)은, 텍스트에서 의미있는 정보를 분석하여 추출하는 자연어 처리를 수행하기 위하여 상술된 기술 이외에도 종래의 공지된 다양한 수학적 알고리즘을 이용할 수 있으며, 본 발명에서는 텍스트에서 의미있는 정보를 분석하여 추출하여 자연어 처리를 수행하는 알고리즘 자체를 한정하거나 제한하지는 않는다. However, in the present embodiment, the meta-application 211 may use various conventionally known mathematical algorithms in addition to the above-described techniques to perform natural language processing for analyzing and extracting meaningful information from text. It does not limit or limit the algorithm itself that performs natural language processing by analyzing and extracting meaningful information.

이때, 예시적으로 메타 어플리케이션(211)은, 자연어 처리를 수행하여 텍스트에 대한 불용어(stopword)/불용문장 처리를 수행할 수 있다. In this case, for example, the meta-application 211 may perform natural language processing to perform stopword/stopword processing on text.

여기서, 불용어/불용문장 처리란, 인터넷 검색 시 검색 용어/문장으로 사용하지 않는 단어/문장인 불용어/불용문장을 웹 페이지로부터 추출된 텍스트로부터 제외시키는 처리일 수 있다. Here, the stopword/disused sentence processing may be a process of excluding from the text extracted from the web page, words/sentences that are not used as search terms/sentences during Internet search.

예컨대, 불용어 처리는, 웹 페이지로부터 추출된 텍스트에 대한 품사 태깅(part-of-speech tagging)을 수행하여 검색 색인 단어로서 의미가 없는 관사, 전치사, 조사 및/또는 접속사 등을 제외시키는 처리일 수 있다. For example, stop-word processing may be a process of excluding meaningless articles, prepositions, articles and/or conjunctions as search index words by performing part-of-speech tagging on text extracted from web pages. have.

또한, 불용문장 처리는, 웹 페이지로부터 추출된 텍스트를 기반으로 텍스트 분류 작업(예컨대, 웹 페이지와 무관한 광고 텍스트 분류 및/또는 메뉴 텍스트 분류 등) 및/또는 음소 표기법(Lemmatization: 전후 문맥에 기반하여 단어를 식별하는 기법) 등을 수행하여 검색 색인 단어로서 의미가 없는 문장을 제외시키는 처리일 수 있다. In addition, the dead sentence processing is a text classification operation (eg, advertising text classification and/or menu text classification irrelevant to the web page) and/or phonemic notation (Lemmatization: based on context) based on the text extracted from the web page This may be a process of excluding sentences that have no meaning as a search index word by performing a technique for identifying words).

실시예에서, 메타 어플리케이션(211)은, 웹 페이지로부터 추출된 단어 및/또는 문장의 형태인 텍스트에 대한 자연어 처리가 수행되면, 추출된 텍스트에서 불용어/불용문장이 제외되어 인터넷 검색 시 검색 색인으로 의미있는 텍스트만을 도출할 수 있다. In an embodiment, when natural language processing is performed on text in the form of words and/or sentences extracted from a web page, the meta-application 211 excludes stopwords/sentences from the extracted text to be used as a search index when searching the Internet. Only meaningful text can be derived.

그리고 메타 어플리케이션(211)은, 도출된 텍스트를 유효 텍스트로 설정하여 웹 페이지로부터 추출된 텍스트를 정제할 수 있다. In addition, the meta application 211 may refine the text extracted from the web page by setting the derived text as a valid text.

또한, 본 발명의 실시예에서 메타 어플리케이션(211)은, 위와 같은 자연어 처리에 기초하여 획득된 유효 텍스트를 기반으로 웹 페이지를 요약할 수 있고, 이를 통해 핵심 키워드를 도출할 수 있다. (S107) In addition, in the embodiment of the present invention, the meta application 211 may summarize a web page based on the valid text obtained based on the above natural language processing, and may derive a key keyword through this. (S107)

자세히, 메타 어플리케이션(211)은, 웹 페이지로부터 도출된 검색 색인으로 의미있는 텍스트인 유효 텍스트(즉, 웹 페이지에서 추출된 텍스트 중 불용어/불용문장이 제외된 텍스트)에 기반하여, 해당 웹 페이지를 요약할 수 있고, 핵심 키워드를 도출할 수 있다. In detail, the meta-application 211 is a search index derived from a web page, based on valid text, which is a meaningful text (that is, text in which stopwords/sentences are excluded among texts extracted from a web page), It can be summarized and key keywords can be derived.

여기서, 실시예에 따른 핵심 키워드란, 자연어 처리를 통해 획득된 유효 텍스트 중, 웹 페이지 상에서의 중요도가 소정의 기준 이상(예컨대, 텍스트 간 유사도가 기설정된 수치 이상 등)으로 높다고 판단된 텍스트일 수 있다. Here, the core keyword according to the embodiment may be text determined to have high importance on a web page by a predetermined standard or higher (eg, similarity between texts is greater than or equal to a preset value) among valid texts obtained through natural language processing. have.

이러한 핵심 키워드는, 단어 형태의 유효 텍스트 중 웹 페이지 상에서의 중요도가 소정의 기준 이상이라고 판단된 핵심단어 및/또는 문장 형태의 유효 텍스트 중 웹 페이지 상에서의 중요도가 소정의 기준 이상이라고 판단된 핵심문장을 포함할 수 있다. These core keywords are a key word whose importance on a web page is determined to be greater than or equal to a predetermined criterion among valid texts in word form and/or a core sentence whose importance on a web page is determined to be greater than or equal to a predetermined criterion among valid texts in a sentence form. may include.

자세히, 메타 어플리케이션(211)은, 공지된 소정의 텍스트 요약 알고리즘을 이용하여 유효 텍스트에 기반한 웹 페이지 요약 수행할 수 있고, 이 과정에서 핵심 키워드를 도출할 수 있다. In detail, the meta application 211 may perform web page summary based on valid text using a known text summary algorithm, and may derive key keywords in this process.

예를 들면, 메타 어플리케이션(211)은, 텍스트랭크(TextRank) 알고리즘 등과 같은 텍스트 요약 알고리즘을 활용하여 웹 페이지에 대한 요약을 수행할 수 있다. For example, the meta application 211 may perform a summary of a web page by using a text summary algorithm such as a TextRank algorithm.

이때, 메타 어플리케이션(211)은, 텍스트랭크 알고리즘에 기반하여 웹 페이지의 유효 텍스트 간의 유사도를 분석할 수 있다. In this case, the meta application 211 may analyze the similarity between valid texts of the web page based on the TextRank algorithm.

그리고 메타 어플리케이션(211)은, 분석된 유사도에 기반하여 웹 페이지에서의 중요도에 따른 핵심 키워드를 도출할 수 있다. In addition, the meta application 211 may derive key keywords according to importance in the web page based on the analyzed similarity.

예컨대, 메타 어플리케이션(211)은, 유효 텍스트의 제 1 텍스트와 유사도가 높은 타 텍스트가 많을수록, 해당 텍스트들에 높은 중요도를 부여할 수 있고 핵심 키워드로 선정할 수 있다. For example, as the number of other texts having a high similarity with the first text of the valid text increases, the meta-application 211 may assign a high importance to the corresponding texts and may select them as key keywords.

여기서, 중요도는, 텍스트 랭크 알고리즘을 통해 산출된 TF 값(노출 빈도 값)과, TR(TextRank) 값, TF-IDF값, PMI(Point-wise Mutual Information) 값 또는 상기 값들을 통해 도출된 텍스트별(예컨대, 단어 또는 문장) 점수를 의미할 수 있다. Here, the importance is a TF value (exposure frequency value) calculated through the text rank algorithm, a TR (TextRank) value, a TF-IDF value, a PMI (Point-wise Mutual Information) value, or each text derived through the values. (eg, a word or a sentence) may mean a score.

그리고 핵심 키워드는, 이러한 텍스트별 중요도 값을 기준으로 전체 텍스트에서 분류된 일부 텍스트를 의미할 수 있다. 예를 들어, 핵심 키워드는, 웹 페이지 내 전체 텍스트 중 중요도 값의 크기에 따라서 기 설정된 텍스트 개수만큼 추출한 텍스트들을 포함할 수 있다. In addition, the core keyword may mean some text classified from the entire text based on the importance value for each text. For example, the core keyword may include texts extracted as much as a preset number of texts according to the size of the importance value among all texts in the web page.

또한, 메타 어플리케이션(211)은, 도출된 핵심 키워드를 기반으로 해당 웹 페이지를 요약할 수 있다. Also, the meta application 211 may summarize the corresponding web page based on the derived core keyword.

예를 들면, 메타 어플리케이션(211)은, 도출된 핵심 키워드만을 유지 처리하는 방식으로 해당 웹 페이지를 요약시킬 수 있다. For example, the meta application 211 may summarize the corresponding web page in such a way that only the derived core keywords are maintained and processed.

또는, 예시적으로 메타 어플리케이션(211)은, 텍스트랭크 알고리즘을 이용하여 워드 그래프(Word graph) 또는 문장 그래프(Sentence graph)를 구축한 뒤, 그래프 랭킹(Graph ranking) 알고리즘인 페이지랭크(PageRank)를 이용하여, 주어진 텍스트로부터 핵심단어 및/또는 핵심문장을 결정할 수 있다. Or, for example, the meta-application 211 builds a word graph or a sentence graph using a textrank algorithm, and then builds a graph ranking algorithm, PageRank. It can be used to determine key words and/or key sentences from a given text.

예를들어, 메타 어플리케이션(211)은, 유효 텍스트를 단어별로 분류하고, 단어별로 유사도와 노출 빈도 등을 토대로 중요도를 산출할 수 있고, 소정의 기준 핵심단어수나 소정의 중요도 이상의 단어들을 중요도가 높은 순으로 한정하여 유효 텍스트의 일부 단어들을 핵심 단어로 결정할 수 있다. For example, the meta-application 211 may classify valid texts by word, calculate importance based on similarity and exposure frequency for each word, and select a predetermined number of key words or words having a predetermined importance or higher. By limiting the order, some words of the valid text may be determined as key words.

또한, 메타 어플리케이션(221)는, 유효 텍스트를 문장별로 분류하고, 분류된 문장별 중요도를 산출한 후, 소정의 기준 핵심문장수나 소정의 중요도 이상의 문장들을 중요도가 높은 순으로 한정하여 유효 텍스트의 일부 문장을 핵심문장으로 결정할 수 있다. In addition, the meta-application 221 classifies the valid text by sentence, calculates the importance of each classified sentence, and limits the number of key sentences or sentences with a predetermined importance or higher in the order of highest importance to part of the valid text. A sentence can be determined as a key sentence.

그리고 메타 어플리케이션(211)은, 결정된 핵심단어 및/또는 핵심문장을 조합하여 웹페이지를 요약된 텍스트 문서로 정리할 수 있다.In addition, the meta application 211 may organize the web page into a summarized text document by combining the determined key words and/or key sentences.

즉, 메타 어플리케이션(211)은, 텍스트랭크 알고리즘에 기반하여 웹 페이지의 유효 텍스트로부터 핵심단어 및/또는 핵심문장을 포함하는 핵심 키워드를 도출할 수 있고, 도출된 핵심 키워드에 기반하여 웹 페이지를 요약한 텍스트 문서를 생성할 수 있다. That is, the meta-application 211 may derive a key keyword including a key word and/or a key sentence from the valid text of the web page based on the TextRank algorithm, and summarize the web page based on the derived core keyword. You can create a single text document.

다만, 본 실시예에서 메타 어플리케이션(211)은, 텍스트에서 핵심 키워드를 도출하기 위하여 상술된 기술 이외에도 종래의 공지된 다양한 수학적 알고리즘을 이용할 수 있으며, 본 발명에서는 텍스트에서 핵심 키워드를 도출하는 알고리즘 자체를 한정하거나 제한하지는 않는다. However, in the present embodiment, the meta-application 211 may use various conventionally known mathematical algorithms in addition to the above-described techniques for deriving key keywords from text, and in the present invention, the algorithm itself for deriving key keywords from text is used. not limiting or limiting.

또한, 본 발명의 실시예에서 메타 어플리케이션(211)은, 도출된 핵심 키워드에 기반하여 메타데이터를 생성할 수 있다. (S109) Also, in an embodiment of the present invention, the meta application 211 may generate metadata based on the derived core keyword. (S109)

다시 상기하자면, 실시예에 따른 메타데이터는, 웹 페이지의 속성을 함축하여 나타내는 데이터로서, 웹 페이지의 데이터를 효율적으로 이용하기 위하여 구조화한 데이터 정보일 수 있고, 검색엔진에서의 검색 시 활용되는 검색 색인으로 기능할 수 있다. To recall again, the metadata according to the embodiment is data implying and representing the properties of a web page, and may be data information structured to efficiently use data of a web page, and a search used in a search in a search engine. It can function as an index.

또한, 실시예에서 이러한 메타데이터는, 웹 페이지에 대한 제목, 설명 및/또는 태그어 메타데이터 섹션 등을 포함할 수 있다. Also, in an embodiment, such metadata may include a title, description, and/or tag word metadata section for the web page, and the like.

또한, 실시예에서 메타데이터는, 각 메타데이터 섹션으로 분류되는 적어도 하나 이상의 메타데이터 개체를 포함할 수 있다. Also, in an embodiment, the metadata may include at least one or more metadata entities classified into each metadata section.

자세히, 메타 어플리케이션(211)은, 소정의 기준에 따라서, 웹 페이지에 대한 검색 색인으로 의미있는 텍스트인 핵심 키워드(핵심단어 및/또는 핵심문장 포함)에 기반하여 메타데이터를 생성할 수 있다. In detail, the meta application 211 may generate metadata based on a key keyword (including a key word and/or a key sentence) that is a meaningful text as a search index for a web page according to a predetermined criterion.

보다 상세히, 메타 어플리케이션(211)은, 웹 페이지로부터 도출된 핵심 키워드의 1) 웹 페이지 내 위치(예컨대, 상단, 중간 또는 하단 등), 2) 폰트 형태(예컨대, 폰트 크기 및/또는 굵기 등) 3) 핵심키워드별 중요도 및/또는 4) 웹 페이지 내 등장 빈도 수 등을 상기 소정의 기준으로 설정할 수 있다. In more detail, the meta-application 211 includes 1) a location in a web page (eg, top, middle, or bottom, etc.) of a key keyword derived from a web page, 2) a font type (eg, font size and/or thickness, etc.) 3) the importance of each key keyword and/or 4) the frequency of appearance in the web page may be set based on the predetermined criteria.

또한, 메타 어플리케이션(211)은, 설정된 소정의 기준에 따라서 핵심 키워드를 제목 메타데이터 섹션, 설명 메타데이터 섹션 또는 태그어 메타데이터 섹션 중 적어도 하나의 카테고리로 분류할 수 있고, 이를 통해 분류된 객체의 메타데이터 개체로 설정할 수 있다. In addition, the meta application 211 may classify the core keyword into at least one category of a title metadata section, a description metadata section, or a tag word metadata section according to a set predetermined criterion, and It can be set as a metadata object.

자세히, 메타 어플리케이션(211)은, 핵심 키워드 중 일부를 기초로 제목 섹션에 포함되는 제목 메타데이터 개체로 생성할 수 있다. In detail, the meta application 211 may generate a title metadata object included in the title section based on some of the key keywords.

예를 들어, 메타 어플리케이션(211)은, 핵심단어 중 높은 중요도 순으로 제 n 개를 추출하고, 추출된 n개의 핵심단어를 모두 포함하는 문장을 생성하여 제목 섹션의 메타데이터 개체를 생성할 수 있다. For example, the meta application 211 may extract the nth number of key words in the order of importance among the key words, and generate a sentence including all of the extracted n key words to generate the metadata object of the title section. .

또한, 메타 어플리케이션(211)은, 핵심문장 중 높은 중요도 순으로 제 m 개를 추출하고, 추출된 m개의 핵심문장을 나열하여 제목 섹션의 메타데이터 개체를 생성할 수 있다. Also, the meta-application 211 may extract m th key sentences from among the key sentences in order of importance, and list the extracted m core sentences to generate a metadata object of the title section.

다른 실시예에서, 메타 어플리케이션(221)은, 핵심문장에 기초가 되는 이미지의 글자의 위치 또는/및 형태를 기준으로 제목 섹션의 메타데이터 개체가 될 핵심문장을 결정할 수 있다. In another embodiment, the meta application 221 may determine the core sentence to be the metadata object of the title section based on the position and/or the shape of the characters of the image that is the basis of the core sentence.

도 6은 본 발명의 실시예에 따른 이미지를 포함하는 웹 페이지에 기반하여 메타데이터를 생성하는 방법을 설명하기 위한 도면이다. 6 is a diagram for explaining a method of generating metadata based on a web page including an image according to an embodiment of the present invention.

예를 들어, 도 6을 참조하면, 메타 어플리케이션(211)은, 핵심문장의 기초가 되는 글자가 이미지에 가장 큰 영역을 차지하는 경우, 해당 글자에 대한 핵심문장을 제목 섹션의 메타데이터 개체로 결정할 수 있다. For example, referring to FIG. 6 , the meta application 211 may determine the key sentence for the key sentence as the metadata object of the title section when the character that is the basis of the core sentence occupies the largest area in the image. have.

또한, 메타 어플리케이션(211)은, 핵심문장의 기초가 되는 글자가 이미지의 최 상단에 위치한 경우, 해당 글자의 핵심문장을 제목 섹션의 메타데이터 개체로 결정할 수 있다. In addition, the meta application 211 may determine the core sentence of the corresponding character as the metadata object of the title section when the character, which is the basis of the core sentence, is located at the top of the image.

또한, 메타 어플리케이션(211)은, 핵심문장 별의 기초가 되는 글자의 위치 및 크기를 기초로 점수를 산정하고, 가장 높은 점수의 글자에 대한 핵심문장을 제목 섹션의 메타데이터 개체로 결정할 수 있다. In addition, the meta application 211 may calculate a score based on the position and size of the letters that are the basis of each key sentence, and determine the core sentence for the letter having the highest score as the metadata object of the title section.

예를 들면, 메타 어플리케이션(211)은, 도 6에 도시된 웹 페이지 상에서 추출된 핵심 키워드의 핵심문장 중, 웹 페이지 내 위치가 ‘최상단’에 위치한 '내 피부에 내리는 강력한 햇살비 보습' 글자로 이루어진 핵심문장과, 크기가 가장 큰 '햇살비 라인' 글자에 대한 핵심문장을 제목 섹션으로 분류하고, 분류된 핵심문장을 결합한 '내 피부에 내리는 강력한 햇살비 보습 햇살비 라인'을 제목 섹션의 메타데이터 개체로 생성할 수 있다. For example, the meta-application 211, among the key sentences of the key keywords extracted on the web page shown in FIG. 6 , is the letter 'Moisturizing the strong sunlight falling on my skin' located at the 'top' of the web page. In the title section, we classify the core sentences made up of the key sentences and the key sentences for the 'sunshine rain line', which is the largest in size, in the title section, and combine the classified key sentences with 'Strong Sunshine Rain Moisturizing Sunshine Rain Line' in the title section. It can be created as a data object.

이때, 실시예에 따라서 메타 어플리케이션(211)은, 웹 페이지의 이미지에 매칭되어 기저장되어 있을 수 있는 대체 텍스트 태그에 기반하여, 제목 섹션을 바로 설정할 수도 있다. In this case, according to an embodiment, the meta application 211 may directly set the title section based on an alternative text tag that may be pre-stored by matching the image of the web page.

자세히, 메타 어플리케이션(211)은, 웹 페이지의 이미지에 매칭되는 대체 텍스트 태그가 존재하는 경우, 도출된 핵심 키워드에 관계없이 상기 대체 텍스트 태그를 제목 섹션으로 설정할 수도 있다. In detail, when there is an alternative text tag matching the image of the web page, the meta application 211 may set the alternative text tag as the title section regardless of the derived core keyword.

또한, 실시예에서 메타 어플리케이션(211)은, 제목 섹션 설정 시 생성되는 제목 메타데이터 개체의 총 글자 수의 한도를 설정할 수 있다. Also, in an embodiment, the meta application 211 may set a limit on the total number of characters of a title metadata object generated when a title section is set.

예를 들면, 메타 어플리케이션(211)은, 제목 메타데이터 개체의 총 글자 수의 한도를 '공백 포함 40자 이내'로 설정할 수 있고, 이를 통해 적정한 길이의 제목 섹션이 구성되도록 보조할 수 있다. For example, the meta application 211 may set the limit of the total number of characters of the title metadata object to 'within 40 characters including spaces', and through this, may assist in composing a title section of an appropriate length.

또한, 메타 어플리케이션(211)은, 제목 섹션을 설정할 때, 상기 생성된 제목 메타데이터 개체에 더하여 추후 검색 시 제공되는 호출 화면의 완성도를 높이기 위한 추가 정보를 더 포함해 설정할 수 있다. In addition, when setting the title section, the meta application 211 may further include additional information for enhancing the completeness of a call screen provided during a later search in addition to the generated title metadata object.

실시예로, 메타 어플리케이션(211)은, 생성된 제목 메타데이터 개체와, 해당 웹 사이트 이름 및 상기 제목 메타데이터 개체와 상기 웹 사이트 이름을 구분하도록 보조하는 특수문자를 더 포함하여 제목 섹션을 설정할 수 있다. In an embodiment, the meta application 211 may set the title section by further including a generated title metadata object, a corresponding website name, and special characters to help distinguish the title metadata object from the website name. have.

예를 들면, 메타 어플리케이션(211)은, '제목 메타데이터 개체', '특수문자' 및 '웹 사이트 이름'을 순서대로 나열하여 제목 섹션을 구성할 수 있다. For example, the meta application 211 may configure the title section by listing 'title metadata object', 'special character', and 'website name' in order.

또한, 메타 어플리케이션(211)은, 핵심 키워드 중 적어도 일부를 기초로 설명 섹션에 포함되는 설명 메타데이터 개체로 생성할 수 있다. Also, the meta application 211 may generate a description metadata object included in the description section based on at least some of the key keywords.

예를 들어, 메타 어플리케이션(211)은, 핵심단어 중 높은 중요도 순으로 제 x 개를 추출하고, 추출된 x개의 핵심단어를 조합한 적어도 하나 이상의 문장을 설명 메타데이터 개체로 생성할 수 있다. For example, the meta-application 211 may extract x-th keywords from among the key words in the order of importance, and generate at least one sentence combining the extracted x key words as explanatory metadata objects.

또한, 메타 어플리케이션(211)은, 핵심문장 중 높은 중요도 순으로 제 y 개를 추출하고, 추출된 y개의 핵심문장을 나열하여 설명 섹션의 메타데이터 개체를 생성할 수 있다. Also, the meta-application 211 may extract y th key sentences from among the key sentences in the order of importance, and list the y extracted core sentences to generate the metadata object of the description section.

예를 들면, 메타 어플리케이션(211)은, 도 6에 도시된 웹 페이지 상에서 추출된 핵심 키워드의 핵심문장 중, '햇살비 꿀 보습 클렌징 폼', '햇살비 촉촉 스킨', '햇살비 돋움 로션', '햇살비 클렌징 워터' 및 '햇살비 수분 크림' 등의 문장을 설명 섹션으로 메타데이터 개체로 설정할 수 있다. For example, the meta-application 211, among the key sentences of the key keywords extracted on the web page shown in FIG. , 'Hatsalbi Cleansing Water' and 'Hatsalbi Moisture Cream' can be set as metadata objects as a description section.

이때, 실시예에 따라서 메타 어플리케이션(211)은, 제목 섹션의 전체 또는 일부 또한 설명 섹션으로 설정할 수 있는 등 다양한 실시예가 가능할 수 있다. In this case, according to embodiments, the meta application 211 may be configured in various embodiments, such as all or part of a title section may also be set as a description section.

예를 들면, 메타 어플리케이션(211)은, 제목 섹션으로 설정된 '햇살비 라인' 문장을 설명 섹션에도 포함시켜 설명 섹션을 구성할 수 있다. For example, the meta application 211 may configure the description section by including the sentence 'sunray rain line' set as the title section in the description section.

또한, 실시예에서 메타 어플리케이션(211)은, 웹 페이지 문서 내에서 자주 등장하는 단어의 유사도를 기반으로 추출된 핵심문장을 설명 섹션으로 바로 설정할 수도 있다. In addition, in an embodiment, the meta application 211 may directly set a key sentence extracted based on the similarity of a word frequently appearing in a web page document as a description section.

이때, 메타 어플리케이션(211)은, 설명 섹션의 설명 메타데이터 개체의 총 글자 수의 한도를 설정할 수 있다. In this case, the meta application 211 may set a limit on the total number of characters of the description metadata object in the description section.

예를 들면, 메타 어플리케이션(211)은, 설명 메타데이터 개체의 총 글자 수의 한도를 '공백 포함 80자 이내'로 설정할 수 있고, 이를 통해 적정한 길이의 설명 섹션이 구성되게 할 수 있다. For example, the meta application 211 may set the limit of the total number of characters of the description metadata object to 'less than 80 characters including spaces', and through this, a description section of an appropriate length may be configured.

한편, 메타 어플리케이션(211)은, 핵심 키워드에 포함된 핵심단어들 중 적어도 일부를 태그어 섹션에 포함되는 태그어 메타데이터 개체로 선정할 수 있다. Meanwhile, the meta application 211 may select at least some of the key words included in the core keyword as the tag word metadata object included in the tag word section.

실시예에서, 메타 어플리케이션(211)은, 핵심단어들을 높은 중요도 순으로 기 설정된 제 k 개까지 추출하여, 추출된 핵심단어들을 태그어 섹션에 태그어 메타데이터 개체로 설정할 수 있다. In an embodiment, the meta application 211 may extract up to k th key words in the order of high importance, and set the extracted key words as tag word metadata objects in the tag word section.

또한, 다른 실시예에서 메타 어플리케이션(211)은, 핵심단어의 웹 페이지 내 등장 빈도 수를 산출하고, 산출된 빈도수가 소정의 기준(예컨대, 기설정된 횟수 이상 및/또는 전체 핵심단어 각각의 등장 빈도 수 중 상위 1% 등)을 충족하는 경우, 해당 핵심단어를 태그어 메타데이터 섹션으로 분류하여 태그어 섹션으로 설정할 수 있다. In addition, in another embodiment, the meta application 211 calculates the frequency of occurrence of the key word in the web page, and the calculated frequency is based on a predetermined criterion (eg, a preset number of times or more and/or the frequency of appearance of each of the entire key words) If it satisfies the top 1% of the number, etc.), the corresponding key word can be classified as a tag word metadata section and set as a tag word section.

예를 들면, 메타 어플리케이션(211)은, 도 6에 도시된 웹 페이지 상에서 추출된 핵심 키워드의 핵심단어 중, 웹 페이지 내 등장 빈도 수가 가장 높은 핵심단어인 '햇살비' 단어를 태그어 메타데이터 섹션으로 분류하여 태그어 섹션으로 설정할 수 있다. For example, the meta-application 211 tags the keyword 'sunshine rain', which is a key word with the highest frequency of appearance in the web page, among the core words of the core keywords extracted on the web page shown in FIG. 6 , in the metadata section. It can be classified as a tag word section.

이때, 실시예에 따라서 메타 어플리케이션(211)은, 태그어 섹션의 적어도 일부가 제목 섹션으로 설정되게 할 수도 있다. In this case, according to an embodiment, the meta application 211 may set at least a part of the tag word section as the title section.

예를 들면, 메타 어플리케이션(211)은, 태그어 섹션 중 가장 높은 웹 페이지 내 등장 빈도 수를 가지는 제 1 태그어 개체의 등장 빈도 수와, 제목 섹션의 핵심문장이 포함하는 단어 중 가장 높은 웹 페이지 내 등장 빈도 수를 가지는 제 1 단어의 등장 빈도 수를 비교하여, 제 1 태그어 개체의 등장 빈도 수가 제 1 단어의 등장 빈도 수 보다 더 높은 빈도 수를 가지는 경우, 제목 섹션을 상기 제 1 태그어 개체로 설정할 수 있다. For example, the meta-application 211 may include the number of appearances of the first tagged word object having the highest frequency of appearance in the web page among the tagged word sections, and the highest web page among words included in the core sentence of the title section. By comparing the number of appearances of the first word having the number of appearances within, when the number of appearances of the first tagged word object has a higher frequency than the number of appearances of the first word, the title section is divided into the first tagged word It can be set as an object.

또한, 실시예에서 웹 페이지로부터 도출된 핵심 키워드를 각 메타데이터 섹션으로 분류하고 메타데이터 개체로 설정한 메타 어플리케이션(211)은, 각 메타데이터 섹션과 메타데이터 개체에 기반하여 해당 웹 페이지에 대한 메타데이터를 생성할 수 있다. In addition, in the embodiment, the meta application 211 that classifies the core keyword derived from the web page into each metadata section and sets it as a metadata object, based on each metadata section and the metadata object, provides a meta data for the corresponding web page. data can be generated.

즉, 메타 어플리케이션(211)은, 소정의 기준에 따른 분류에 의해 제목 섹션, 설명 섹션 및/또는 태그어 섹션으로 구분된 제목, 설명 및/또는 태그어 섹션에 기반하여, 웹 페이지에 대한 메타데이터를 생성할 수 있다. That is, the meta-application 211 provides metadata for the web page based on the title, description, and/or tag word section divided into a title section, a description section, and/or a tag word section by classification according to a predetermined criterion. can create

이와 같이, 본 발명의 실시예에서 메타 어플리케이션(211)은, 웹 페이지 내 핵심 키워드의 위치, 폰트 형태 및/또는 등장 빈도 수에 따라서 해당 핵심 키워드를 제목, 설명 및/또는 태그어로 설정함으로써, 웹 페이지가 제공하는 정보에 대한 크롤링(crawling) 프로세스의 성능을 향상시킬 수 있고, 이에 기반하여 생성되는 메타데이터의 품질 또한 증진시킬 수 있다. As such, in the embodiment of the present invention, the meta application 211 sets the corresponding core keyword as a title, description and/or tag word according to the position, font type, and/or frequency of appearance of the core keyword in the web page, thereby making the web The performance of a crawling process for information provided by a page may be improved, and the quality of metadata generated based thereon may also be improved.

또한, 본 발명의 실시예에서 메타 어플리케이션(211)은, 위와 같이 생성된 메타데이터에 대한 편집 인터페이스를 제공할 수 있다. (S111) Also, in an embodiment of the present invention, the meta application 211 may provide an editing interface for the metadata generated as described above. (S111)

도 7은 본 발명의 실시예에 따른 메타데이터에 대한 편집 인터페이스를 제공하는 모습의 일례이다. 7 is an example of providing an editing interface for metadata according to an embodiment of the present invention.

자세히, 도 7을 참조하면, 메타 어플리케이션(211)은, 생성된 메타데이터가 포함하는 제목, 설명 및/또는 태그어 섹션에 대하여 편집을 수행할 수 있는 메타데이터 편집 인터페이스를 제공할 수 있다. In detail, referring to FIG. 7 , the meta application 211 may provide a metadata editing interface capable of editing a title, description, and/or tag word section included in the generated metadata.

또한, 실시예에서 메타 어플리케이션(211)은, 제공된 편집 인터페이스에 대한 사용자 입력에 기반하여 제목, 설명 및/또는 태그어 섹션에 대한 정정을 수행할 수 있다. In addition, in an embodiment, the meta application 211 may correct the title, description, and/or tag word section based on a user input to the provided editing interface.

예를 들면, 메타 어플리케이션(211)은, 생성된 메타데이터의 제목 섹션이 '이 문장이 제목 섹션입니다.'인 경우, 편집 인터페이스에 기반한 사용자 입력에 따라서 상기 제목 섹션을 '이 문장이 변경된 제목 섹션입니다.'로 변경할 수 있다. For example, when the title section of the generated metadata is 'This sentence is the title section.', the meta application 211 changes the title section to 'The title section in which this sentence has been changed according to a user input based on the editing interface.' It can be changed to '.

이와 같이, 메타 어플리케이션(211)은, 자동으로 생성된 메타데이터에 대한 편집 기능을 제공하여, 사용자의 인지적 판단을 기반으로 메타데이터의 품질과 정확성을 향상시킬 수 있고, 사용자의 요구에 최적화된 메타데이터를 설정할 수 있다. As such, the meta application 211 provides an editing function for the automatically generated metadata, so that the quality and accuracy of the metadata can be improved based on the user's cognitive judgment, and the metadata is optimized for the user's needs. You can set metadata.

또한, 본 발명의 실시예에서 메타 어플리케이션(211)은, 위와 같이 메타데이터를 생성할 시, 메타데이터 생성 프로세스를 보조하는 각종 인터페이스를 제공할 수 있다. In addition, in the embodiment of the present invention, the meta application 211 may provide various interfaces to assist the metadata generation process when generating metadata as described above.

자세히, 메타 어플리케이션(211)은, 메타데이터를 생성하는 화면에 기반하여 1) 속성 적합도 인터페이스 및 2) 프리뷰(preview) 인터페이스를 제공할 수 있다. In detail, the meta application 211 may provide 1) an attribute suitability interface and 2) a preview interface based on a screen for generating metadata.

보다 상세히, 도 9를 참조하면 메타 어플리케이션(211)은, 메타데이터를 생성할 때 각각의 섹션에 포함되어 있는 각 메타데이터 개체가, 해당하는 섹션에 포함되기에 적합한 개체인지를 판단한 속성 적합도를 나타내는 속성 적합도 인터페이스를 제공할 수 있다. In more detail, referring to FIG. 9 , when generating metadata, the meta-application 211 indicates the property suitability by determining whether each metadata object included in each section is an object suitable for inclusion in the corresponding section. An attribute conformance interface may be provided.

이때, 메타 어플리케이션(211)은, 각 섹션별 분류 기준 등을 활용하여 각 섹션 별 메타데이터 개체의 속성 적합도를 판단할 수 있다. In this case, the meta application 211 may determine the property suitability of the metadata object for each section by utilizing the classification criteria for each section.

예를 들면, 메타 어플리케이션(211)은, 제목 섹션인 경우, 핵심 키워드 포함여부 및/또는 타 웹페이지에 동일 텍스트 존재 여부 등에 기초하여 제목 메타데이터 개체의 속성 적합도를 판단할 수 있다. For example, in the case of a title section, the meta application 211 may determine the property suitability of the title metadata object based on whether a key keyword is included and/or whether the same text exists in other web pages.

이때, 메타 어플리케이션(211)은, 소정의 기준(예컨대, 속성 적합도 값 구간에 따른 단계별 분류 등)에 따라서 속성 적합도를 속성 적합도 인터페이스를 통해 제공할 수 있다. In this case, the meta application 211 may provide the attribute suitability through the attribute suitability interface according to a predetermined criterion (eg, step-by-step classification according to the attribute suitability value interval, etc.).

자세히, 위와 같은 속성 적합도를 나타내는 속성 적합도 인터페이스는, 각 섹션에 속하는 메타데이터 개체의 속성 적합도를 소정의 그래픽 이미지(예컨대, 바 그래프 등)로 표시할 수 있고, 해당 속성 적합도의 판단 기준을 설명하는 인터페이스를 포함할 수 있다. In detail, the attribute suitability interface indicating the attribute suitability as described above can display the attribute suitability of the metadata object belonging to each section as a predetermined graphic image (eg, bar graph, etc.), and describes the criterion for determining the attribute suitability. It may include interfaces.

예를 들면, 속성 적합도 인터페이스는, '매우 낮음, 낮음, 보통, 높음, 매우 높음 단계'로 속성 적합도 값 구간에 따라서 속성 적합도가 분류되고, 산출된 속성 적합도 값이 '보통 단계'로 판단된 경우, 바 그래프로 구현된 그래픽 이미지를 통하여 위와 같은 속성 적합도를 표시할 수 있다. For example, in the attribute suitability interface, when the attribute fitness is classified according to the attribute suitability value range as 'very low, low, normal, high, and very high levels', and the calculated attribute fitness value is judged to be 'normal' , it is possible to display the property fit as above through a graphic image implemented as a bar graph.

또한, 속성 적합 인터페이스는, 상기 속성 적합도가 산출된 판단 기준에 대한 설명(예컨대, 웹 페이지의 핵심 키워드가 포함되지 않았습니다 등)을 텍스트로 표시할 수 있다. In addition, the attribute suitability interface may display a description of the criterion for which the attribute suitability is calculated (eg, a core keyword of a web page is not included, etc.) in text.

한편, 도 10을 참조하면, 본 발명의 실시예에서 메타 어플리케이션(211)은, 상술된 프로세스를 통하여 메타데이터가 변경된 경우, 추후 해당 메타데이터를 기반으로 생성될 웹 페이지 검색호출 화면의 수정 전후 모습을 확인할 수 있는 프리뷰 인터페이스를 제공할 수 있다. On the other hand, referring to FIG. 10 , in the embodiment of the present invention, when the metadata is changed through the above-described process, the meta application 211 shows before and after modification of a web page search call screen to be generated based on the corresponding metadata. A preview interface can be provided to check the

자세히, 프리뷰 인터페이스는, 변경된 메타데이터가 적용되기 전과 적용된 후의 웹 페이지 검색호출 화면의 모습을, 상기 변경된 메타데이터가 실제로 웹 페이지 검색호출 화면에 적용되기 이전에 그래픽 이미지로 생성하여 사용자의 요구에 따라 제공할 수 있다. In detail, the preview interface creates a graphic image of the web page search call screen before and after the changed metadata is applied as a graphic image before the changed metadata is actually applied to the web page search call screen according to the user's request. can provide

즉, 메타 어플리케이션(211)은, 수정된 메타데이터를 실제로 적용하기 전에 수정 전후의 검색호출 화면을 미리 제공하여, 변경 전후를 비교하게 할 수 있고, 최종적으로 제공될 화면을 확인하여 필요한 경우 재수정을 수행하게 할 수도 있는 등, 메타데이터에 대한 검토 작업이 효과적으로 이뤄지도록 보조할 수 있다. That is, the meta application 211 may provide a search call screen before and after modification before actually applying the modified metadata to compare before and after the change, and check the screen to be finally provided and perform re-correction if necessary. It can assist in the effective review of metadata, for example.

또한, 본 발명의 실시예에서 메타 어플리케이션(211)은, 생성된 메타데이터를 기초로 웹 페이지 검색호출 화면을 제공할 수 있다. (S113) Also, in an embodiment of the present invention, the meta application 211 may provide a web page search call screen based on the generated metadata. (S113)

여기서, 웹 페이지 검색호출 화면이란, 검색엔진 상에서 메타데이터에 기초한 웹 페이지 검색이 수행된 경우, 검색결과로 출력되는 화면에 표시되는 상기 웹 페이지에 대한 정보 제공 인터페이스일 수 있다. Here, the web page search call screen may be an information providing interface for the web page displayed on a screen output as a search result when a web page search based on metadata is performed on a search engine.

도 8은 본 발명의 실시예에 따른 메타데이터에 기초하여 웹 페이지 검색호출 화면을 제공하는 모습의 일례이다. 8 is an example of providing a web page search call screen based on metadata according to an embodiment of the present invention.

자세히, 도 8을 참조하면, 실시예에서 메타 어플리케이션(211)은, 메타데이터의 제목, 설명 및/또는 태그어 메타데이터 섹션에 포함되는 적어도 하나 이상의 메타데이터 개체를, 소정의 기준(예컨대, 메타데이터 섹션 등)에 따라서 배치하여 웹 페이지 검색호출 화면을 생성할 수 있다. In detail, referring to FIG. 8 , in the embodiment, the meta application 211 selects at least one or more metadata objects included in the title, description, and/or tag word metadata section of metadata, based on a predetermined criterion (eg, meta data). data section, etc.) to create a web page search call screen.

예를 들면, 메타 어플리케이션(211)은, 메타데이터의 제목 섹션을 웹 페이지 검색호출 화면 가장 상단에 배치할 수 있다. 또한, 메타 어플리케이션(211)은, 메타데이터가 포함하는 적어도 하나 이상의 설명 섹션을 웹 페이지 검색호출 화면 중앙에 배치할 수 있다. 또한, 메타 어플리케이션(211)은, 메타데이터가 포함하는 적어도 하나 태그어 섹션을 웹 페이지 검색호출 화면의 가장 하단에 배치할 수 있다. For example, the meta application 211 may arrange the title section of the metadata at the top of the web page search call screen. Also, the meta application 211 may arrange at least one description section included in the metadata in the center of the web page search call screen. Also, the meta application 211 may arrange at least one tag word section included in the metadata at the bottom of the web page search call screen.

또한, 실시예에서 메타 어플리케이션(211)은, 메타데이터의 각 메타데이터 개체가 배치된 웹 페이지 검색호출 화면을, 검색엔진 상에서 상기 메타데이터에 기초한 웹 페이지 검색이 수행된 경우의 검색결과 화면 상에 출력하여 제공할 수 있다. Also, in the embodiment, the meta application 211 displays a web page search call screen in which each metadata object of metadata is arranged on a search result screen when a web page search based on the metadata is performed on a search engine. It can be printed and provided.

이와 같이, 메타 어플리케이션(211)은, 웹 페이지 내의 일반 텍스트뿐만 아니라, 웹 페이지가 포함하는 이미지 내의 글자까지 고려하여 생성된 메타데이터를 기반으로 웹 페이지 검색호출 화면을 생성해 제공함으로써, 웹 페이지에 대한 정보를 보다 정확하게 함축하는 검색 결과를 표시할 수 있고, 웹 페이지에 대한 보다 정확한 검색 색인에 기반하여 해당 웹 페이지가 검색엔진을 통해 검색되게 할 수 있다. As such, the meta application 211 generates and provides a web page search call screen based on the generated metadata considering not only plain text in the web page, but also characters in the image included in the web page, thereby providing the web page. It is possible to display a search result that more accurately implies information about a web page, and to allow the web page to be searched through a search engine based on a more accurate search index for the web page.

이상, 본 발명의 실시예에 따른 웹 페이지에 기반한 메타데이터 생성방법은, 웹 페이지의 텍스트, 이미지 내 글자 및 이미지에 대한 대체 텍스트 중 어느 하나 이상을 기반으로 크롤링(crawling)을 수행하여 상기 웹 페이지에 대한 속성을 설명하는 메타데이터를 제공함으로써, 웹 페이지 내의 일반 텍스트뿐만 아니라 해당 웹 페이지가 포함하는 이미지에 대한 텍스트까지 고려한 메타데이터를 제공할 수 있다. As described above, in the method for generating metadata based on a web page according to an embodiment of the present invention, crawling is performed based on any one or more of the text of the web page, the text in the image, and the alternative text for the image, and the web page is By providing metadata describing the properties of , it is possible to provide metadata that considers not only plain text in a web page but also text about an image included in the web page.

또한, 본 발명의 실시예에 따른 웹 페이지에 기반한 메타데이터 생성방법은, 웹 페이지에 대한 메타데이터를 자동으로 생성하고, 자동 생성된 메타데이터에 대한 편집 인터페이스를 제공함으로써, 자주 업데이트가 일어나며 개인이 관리하기 어려운 메타데이터를 손 쉽게 정의하고 관리할 수 있는 효과가 있다.In addition, the method for generating metadata based on a web page according to an embodiment of the present invention automatically generates metadata for a web page and provides an editing interface for the automatically generated metadata, so that frequent updates occur and individuals can It has the effect of easily defining and managing metadata that is difficult to manage.

또한, 이상에서 설명된 본 발명에 따른 실시예는 다양한 컴퓨터 구성요소를 통하여 실행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등과 같은, 프로그램 명령어를 저장하고 실행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의하여 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용하여 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위하여 하나 이상의 소프트웨어 모듈로 변경될 수 있으며, 그 역도 마찬가지이다.In addition, the embodiments according to the present invention described above may be implemented in the form of program instructions that can be executed through various computer components and recorded in a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the computer-readable recording medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include hard disks, magnetic media such as floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floppy disks. medium), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. A hardware device may be converted into one or more software modules to perform processing in accordance with the present invention, and vice versa.

본 발명에서 설명하는 특정 실행들은 일 실시 예들로서, 어떠한 방법으로도 본 발명의 범위를 한정하는 것은 아니다. 명세서의 간결함을 위하여, 종래 전자적인 구성들, 제어 시스템들, 소프트웨어, 상기 시스템들의 다른 기능적인 측면들의 기재는 생략될 수 있다. 또한, 도면에 도시된 구성 요소들 간의 선들의 연결 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것으로서, 실제 장치에서는 대체 가능하거나 추가의 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들로서 나타내어질 수 있다. 또한, “필수적인”, “중요하게” 등과 같이 구체적인 언급이 없다면 본 발명의 적용을 위하여 반드시 필요한 구성 요소가 아닐 수 있다.The specific implementations described in the present invention are only examples, and do not limit the scope of the present invention in any way. For brevity of the specification, descriptions of conventional electronic components, control systems, software, and other functional aspects of the systems may be omitted. In addition, the connection or connection members of the lines between the components shown in the drawings exemplarily represent functional connections and/or physical or circuit connections, and in an actual device, various functional connections, physical connections that are replaceable or additional may be referred to as connections, or circuit connections. In addition, unless there is a specific reference such as “essential” or “importantly”, it may not be a necessary component for the application of the present invention.

또한 설명한 본 발명의 상세한 설명에서는 본 발명의 바람직한 실시 예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자 또는 해당 기술분야에 통상의 지식을 갖는 자라면 후술할 특허청구범위에 기재된 본 발명의 사상 및 기술 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. 따라서, 본 발명의 기술적 범위는 명세서의 상세한 설명에 기재된 내용으로 한정되는 것이 아니라 특허청구범위에 의해 정하여져야만 할 것이다.In addition, although the detailed description of the present invention has been described with reference to a preferred embodiment of the present invention, those skilled in the art or those having ordinary knowledge in the art will appreciate the spirit of the present invention described in the claims to be described later. And it will be understood that various modifications and variations of the present invention can be made without departing from the technical scope. Accordingly, the technical scope of the present invention should not be limited to the contents described in the detailed description of the specification, but should be defined by the claims.

Claims

A method for generating metadata based on a web page by a metadata generating application running on a processor of a computing device, comprising:
extracting text included in the web page;
extracting text based on the characters in the image included in the web page;
extracting a portion of the extracted text as valid text by performing natural language processing on the extracted text;
calculating the importance of the extracted valid text and determining a part of the extracted valid text as a key keyword according to the calculated importance; and
and generating the metadata based on the determined key keyword.
A method of generating metadata based on a web page.

The method of claim 1,
The step of extracting text based on the characters in the image included in the web page,
extracting an image included in the web page, extracting characters from the extracted image, and converting the extracted characters into text
A method of generating metadata based on a web page.

The method of claim 1,
The step of extracting a valid text from a part of the extracted text,
removing stopwords from the extracted text; and removing stopwords from the extracted text.
A method of generating metadata based on a web page.

The method of claim 1,
The step of determining a part of the extracted valid text as a key keyword,
Separating the valid text into word units, calculating the importance of the separated words, and determining a key word selected according to the order of high importance as the key keyword
A method of generating metadata based on a web page.

5. The method of claim 4,
The step of determining a part of the extracted valid text as a key keyword,
Separating the valid text into sentence units, calculating the importance of the separated sentences, and determining a key sentence selected according to the order of increasing importance as the core keyword.
A method of generating metadata based on a web page.

6. The method of claim 5,
The metadata is
A title metadata object included in a title section used as a title of the web page, a description metadata object of a description section used as a description for the web page, and a tag of a tag word section used as a tag word of the web page which contains a metadata object
A method of generating metadata based on a web page.

7. The method of claim 6,
The step of generating the metadata includes:
setting at least one core sentence among the core sentences as the title metadata object according to the position or size of characters in the image that is the basis of the core sentence
A method of generating metadata based on a web page.

7. The method of claim 6,
The step of generating the metadata includes:
and setting at least a part of the key sentences as the description metadata object.
A method of generating metadata based on a web page.

7. The method of claim 6,
The step of generating the metadata includes:
extracting the k-th key words according to the order of importance of the key words; and setting the extracted key words as the tag word metadata entity.
A method of generating metadata based on a web page.

The method of claim 1,
Further comprising the step of providing a user editing interface for the generated metadata
A method of generating metadata based on a web page.