KR101044633B1

KR101044633B1 - Semantic Web-based Index Method and Search Engine Using the Same

Info

Publication number: KR101044633B1
Application number: KR1020080095268A
Authority: KR
Inventors: 조광현
Original assignee: 조광현; 주식회사 시맨틱스
Priority date: 2007-09-27
Filing date: 2008-09-29
Publication date: 2011-07-01
Also published as: KR20090033149A

Abstract

The present invention relates to a web search technology, and more particularly, to a semantic web-based indexing method for constructing a search database using the semantic web, and a search engine using the same.

The index method of the present invention includes a web page collecting step of collecting web pages distributed on the Internet, treating them as semantic web pages, and storing them in a semantic web page database; A semantic analysis step of extracting the semantic web page stored in the semantic web page database into word, paragraph, and article levels, setting the frequency, relationship, and graphing for each level and storing the semantic web page in the semantic web analysis database; A filtering step of filtering semantic web pages stored in the semantic web analysis database into words, paragraphs, and article levels, and storing the semantic web pages in a semantic web filtered database; A personality analysis step of assigning a personality to the filtered semantic web page; And a classification analysis step of classifying the semantic web page to which the personality is given according to a percentage (%) of the personality. The collected web pages are converted into semantic web pages and analyzed at a word, paragraph, and article level. Create multiple indexes on a web page.

Index, Search Engine, Semantic Web, Filtering, Analysis

Description

Semantic web-based indexing method and search engine using the same {SEMANTIC WEB BASED INDEX METHOD AND SEARCH ENGINE USING THE SAME}

In general, conventional portal (search) sites such as Naver, Dreamwiz, Daum, Yahoo, etc., provide a database for classifying and storing web site information according to a predetermined criterion, and mechanically convert new web site information while continuously traversing the web. It consists of a search robot for collecting data and a search engine that makes the collected data into a database so that users who use portal (search) sites can search. Search and provide a list of sites similar to your keywords.

1 is a diagram showing the overall structure of a general search engine.

Referring to FIG. 1, an internet search engine is an information retrieval system that enables a search for a document existing on a web, and may be broadly divided into a data collection S1, an index S2, and a search S3. In the data collection (S1) section, document collection programs 12 called spiders and crawlers are stored on computers around the world connected to the World Wide Web network 11 based on the link information. The web document is collected and stored in the database 13.

In the index S2, index information of the web document collected by the index module 14 is stored in the index database 16 in order to speed up the search and reduce the amount of data to be stored.

In the search S3 part, whenever the searcher 17 inputs the desired information, the search engine 18 searches the index information stored in the index database 16, and the ranking system 20 searches the search results. The ranking of the search results is provided to the searcher 17 according to the ranking. At this time, the search engine 18 controls the spider program 12 through the spider control 19 to increase the performance of the search, and the index module 14 and the analysis module 15 analyze the collected web documents for indexing. Process.

These internet search engines are classified into a directory search engine, a keyword search engine, and a meta search engine according to a search method. A directory search engine is a search engine that classifies materials by subject or category, and builds a database by adding explanations and evaluations. The keyword search engine collects web documents by web document collection program and stores the collected documents in the search engine's database through the indexing process and searches the user's query words by keyword matching method. Since the meta search engine collects the search contents according to the query term of the searcher from other search engines and shows them to the searcher, the searcher can obtain various search results and display the result by combining the results of the query term in the existing search engine. The advantage is that it does not require space to store data internally.

The Semantic Web, on the other hand, gives well-defined meanings to information on the Web, allowing computers as well as humans to easily interpret the meaning of documents, thus automating tasks such as searching, interpreting, and integrating information using computers. It is proposed for the purpose of doing so.

Semantic Web documents have meanings that can be easily interpreted by computers, unlike existing web documents focused on natural language, so that automated agents or sophisticated search engines can use high meanings to achieve high levels of automation and intelligence. .

In the Semantic Web, resources are expressed in triple form of resources, attributes, and attribute values, and RDF (Resource Description Framework) is defined as a framework for resource representation. The semantic web uses SPARQL as a query language for retrieving resource information expressed in RDF, and is used as a protocol for transmitting queries in a client-server environment.

The general goal of information retrieval systems is to understand the user's intentions and documents from a large amount of stored information so that the user can accurately understand the user's intention and deliver the required documents to the user without being missed by efficient retrieval. .

However, the conventional search engine such as Google does not have any other indexing process between the gathering and the indexing process, ‘A, B, C, D…. Indexed in order, and there is one index on one web page.

Therefore, in the conventional search engine, the wrong web pages are searched corresponding to the keyword input by the user, making the search inconvenient, and there is a problem in that the keyword input of aberration is repeated until the desired information is obtained.

SUMMARY OF THE INVENTION The present invention has been proposed to solve the above problems, and an object of the present invention is to build indexing on a web page in various ways through the semantic web to perform a Meaning Search for keywords entered by a user. It is possible to provide a semantic web-based indexing method and a search engine using the same that enable the user to quickly search for information suitable for user intention.

The search engine of the present invention for achieving the above object is a gathering agent that collects web pages distributed on the Internet and processes them as semantic web pages and stores them in a semantic web page database, and the semantic web stored in the semantic web page database. The semantic analysis agent extracts the word, paragraph and article levels from the page, and sets the frequency, relationship and graph for each level, and stores them in the semantic web analysis database, and the semantic web page stored in the semantic web analysis database. A filtering agent that stores the semantic web filtered database after filtering by level, and a personality analysis agent that gives a personality to the filtered semantic web page, and the semantic web data to which the personality is granted. according to An indexing unit configured to classify the collected web pages into semantic web pages and classify them into words, paragraphs, and article levels to generate a plurality of indexes in one web page; An index database storing an index of each web page generated by the indexing unit; And a search agent for searching the index database according to a user's search term and processing a document search based on the semantic web.

In addition, the index method of the present invention for achieving the above object is a web page collection step of collecting the web pages distributed on the Internet to be processed as a semantic web page and stored in the semantic web page database; A semantic analysis step of extracting the semantic web page stored in the semantic web page database into word, paragraph, and article levels, setting the frequency, relationship, and graphing for each level and storing the semantic web page in the semantic web analysis database; A filtering step of filtering semantic web pages stored in the semantic web analysis database into words, paragraphs, and article levels, and storing the semantic web pages in the semantic web filtered database; A personality analysis step of assigning a personality to the filtered semantic web page; And a classification analysis step of classifying the semantic web page to which the personality is given according to a percentage (%) of the personality. The collected web pages are converted into semantic web pages and analyzed at a word, paragraph, and article level. And generating a plurality of indexes on the web page.

The web page collection step includes a static web page collection step of collecting static web pages operated by static rules while having a certain source format such as newspapers, forums and editorials on the Internet, and a blog or general web page on the Internet. It consists of a dynamic web page collection step of collecting a dynamic web page.

The semantic analysis step may include extracting articles from the collected semantic web pages, setting a frequency, relationship, and graphing; Extracting paragraphs from the collected semantic web pages, establishing a frequency, relationship, and graphing; And extracting words from the collected semantic web pages to set frequency, relationship, and graph processing.

The filtering step may include refining data to be deleted from an article of the analyzed semantic web page; Refining the data to be deleted in the paragraph of the analyzed semantic web page; And purifying the data to be deleted from the words of the analyzed semantic web page.

In the indexing method using the semantic web of the present invention, additional indexing exists between processes from gathering to indexing, and thus hundreds of indexing may exist in one web page. These indexings are word- and paragraph-oriented indexing.

Therefore, while a conventional search engine such as Google provides search results in a stored DB according to one indexing method, a search engine to which the present invention is applied means that there are hundreds of semantic web concepts that grasp the meaning of words in one web page. Meaning Search is possible.

The technical problems achieved by the present invention and the practice of the present invention will be more clearly understood by the preferred embodiments of the present invention described below. The following examples are merely illustrative of the present invention and are not intended to limit the scope of the present invention.

2 illustrates the overall structure of a semantic web based search engine according to the present invention.

As shown in FIG. 2, the semantic web-based search engine according to the present invention is implemented in a semantic web-based search site 200 that can be accessed by a plurality of users 110 through the Internet 102. The semantic web based search site 200 collects web pages from a client interface 202, a search agent (SA) 204, an index database 206, a static internet 102-1 or a dynamic internet 102-2. It is composed of an indexing unit 210, a policy agent (PA: 220), a doctor agent (DA; 222), and a monitoring agent (MA; 224) to index after analysis, and the indexing unit 210 includes a static web page gathering agent ( And a dynamic web page gathering agent (GA; 212), a filtering agent (FA; 213), and an analysis agent (AA; 214).

Referring to FIG. 2, the search engine according to the present invention has a main solution group consisting of seven agents, and a Policy Agent (PA) 220 located above all agents performs specific functions to the agents. Responsible for requesting and directing policy functions.

Gathering Agnet (GA) 211,212 collects web pages, Filter Agent (FA) 213 refines data (changes it to an available form), and Analysis Agent (AA) 214 Analyze the collected data and stores the index in the index database 206. At this time, the static web page gathering agent 211 collects web pages from data of the web page 102-1 operated by static rules while having a certain source format such as newspaper, forum, and editorial, and dynamic web page gathering agent. 212 collects semantic web pages from dynamic internet 102-2, such as blogs, generic web pages.

The search agent (SA) 204 processes ontology search and semantic web document search, and the monitoring agent (MA) 224 detects calculation errors of the indexing unit 210 or monitors the corrected data. Is a tool for delivering to the policy agent 220, and the doctor agent (DA) 222 is responsible for checking the update of the indexing unit 210 and treating errors at the request of the policy agent 220. .

3 is a flowchart illustrating a procedure of indexing using the semantic web according to the present invention, and FIG. 4 is an example of a semantic web solution indexing using the semantic web according to the present invention.

Indexing process using the semantic web according to the present invention, as shown in Figure 3, the web page collection step (S301), semantic analysis step (S302), filtering step (S303), personalization step (S304), classification In step S305, the index DB for semantic web search is generated.

Web page collection step (S301) is a step of taking a web page from the Internet and processing the data of the unrefined dynamic, static web page as web data for semantics and stores it in the semantic web page database (402). To this end, the Gathering Agent for Semantic pages at Static Web pages (GA.S.ST) solution (401a) is a set of web pages operated by static rules that have a uniform source format such as newspapers, forums, and editorials. The web page is collected from the static Internet 102-1, and processed as data for the semantic page and stored in the semantic web page database 402. The Gathering Agent for Semantic pages at Dynamic Web pages (GA.SD) solution (401b) collects web pages from the dynamic Internet 102-2, which consists of dynamic web pages such as blogs and regular web pages. It is processed as data for semantic pages and stored in the semantic web page database 402.

The semantic analysis step S302 is performed in the web page collection step S301 and stores the semantic web pages stored in the semantic web page database 402 as Article (403a), Paragraph (403b), and Word (404). Step to process frequency and relationship analysis data. For this Semantic Article Analysis Agent (AA.SA (Analysis Agent for Semantic Article) The solution 404a extracts the article 403a of each web page from the collected semantic web page database 402, sets the frequency, relationship, and graphs the article analysis data 405a. Stored in the analysis database 405. The Analysis Agent for Semantic Paragraph (AA.SP) solution (404b) extracts paragraphs (403b) of each web page from the collected semantic web page database (402) to establish a frequency, relationship, graph, and paragraphs. The analysis data 405b is stored in the semantic web analytics database 406. The Semantic Word Analysis Agent (AA.SW) solution 404c extracts the words 403c of each web page from the collected semantic web page database 402 to establish a frequency, relationship, and graph for words. The analysis data 405c is stored in the semantic web analytics database. The analysis data divided and analyzed by level are integrated into one and stored in the semantic web analysis database 406.

The filtering step S303 is a step of lowering or refining the waste data to be deleted from the data stored in the semantic web analysis database 406. The filter agent for semantic article (FA.SA) solution 407a refines the data to be deleted from the article analysis data 405a of the stored semantic web database 406 to filter the article filtered data 408a. Stored in the Semantic Web Filtered Database 409.

The Semantic Paragraph Filtering Agent (FA.SP) solution 407b refines the data to be deleted from the paragraph analysis data 405b of the semantic web analytics database 406 to filter the filtered paragraph filtered data 408b. Stored in the Semantic Web Filtered Database 409. The Semantic Word Filtering Agent (FA.SW) solution 407c refines the data to be deleted from the word analysis data 405c of the semantic web analysis database 406 to filter the filtered word filtered data 408c. Stored in the Semantic Web Filtered Database 409. The filtered article filtered data 408a, the paragraph filtered data 409, and the word filtered data 408c are stored in the semantic web filtered database 409.

Characterization step (S304) is a step of giving a personality, such as economy, politics, culture, entertainment, etc. to each semantic web page of the stored semantic web filtered database 409, a personality analysis agent (AA.CS: Analysis Agent for Character at Semantic Web pages) solution 410 assigns personality to semantic web filtered data 409 and stores personality analysis data 411 in semantic web personality database 412.

The classification step (S305) is a step for grouping and extracting semantic web data to which the character is assigned and classifying the analysis data. The classification agent (AA.GS) Analysis Agent for Grouping at Semantic Web pages solution 413 The semantic web personality database 412 classifies semantic web data given personality according to the percentage of personality and stores the classified analysis data 414 in the semantic web classification database 415 to store the indexed database. Create

The present invention has been described above with reference to one embodiment shown in the drawings, but those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom.

1 is a diagram showing the structure of a general search engine,

2 is a diagram illustrating the overall structure of a semantic web based search engine according to the present invention;

3 is a flowchart illustrating a procedure of indexing using the semantic web according to the present invention;

4 illustrates a detailed example of indexing using the semantic web in accordance with the present invention.

Claims

A gathering agent that collects web pages distributed on the Internet, processes them into semantic web pages, and stores them in a semantic web page database, and extracts word, paragraph, and article levels from semantic web pages stored in the semantic web page database at each level. Semantic analysis agent that stores frequency and relationship and graphs and saves in semantic web analysis database, and semantic web page stored in semantic web analysis database is classified into word, paragraph and article level, and stored in semantic web filtered database. And a filtering agent configured to provide a personality to the filtered semantic web page, and a classification analysis agent to classify the semantic web page to which the personality is assigned according to a percentage (%) of the personality. After the conversion to the semantic Web page words and paragraphs, and analyzed by article level indexing for generating a plurality of indexes for a web page;

An index database storing an index of each web page generated by the indexing unit; And

A semantic web-based search engine including a search agent for searching the index database according to a user's search term and processing a document search based on the semantic web.

The method of claim 1, wherein the semantic web based search engine

A policy agent which is located above the agents belonging to the indexing unit and the search agent and is responsible for a policy function for requesting and directing specific agents to perform a specific function;

A monitoring agent that monitors and transmits calculation errors found or corrected in the indexing unit to the policy agent;

The semantic web-based search engine further comprises a doctor agent responsible for checking an update of the indexing unit and treating an error according to a request of the policy agent.

The method of claim 1, wherein the gathering agent

A static web page gathering agent that collects static web pages on the Internet, run by static rules, in a certain source format such as newspapers, forums, editorials,

A semantic web-based search engine that consists of a dynamic web page gathering agent that collects dynamic web pages such as blogs and general web pages on the Internet.

A web page collecting step of collecting web pages distributed on the Internet, treating the semantic web pages, and storing the semantic web pages in a semantic web page database;

A semantic analysis step of extracting the semantic web page stored in the semantic web page database into word, paragraph, and article levels, setting the frequency, relationship, and graphing for each level and storing the semantic web page in the semantic web analysis database;

A filtering step of filtering semantic web pages stored in the semantic web analysis database into words, paragraphs, and article levels, and storing the semantic web pages in a semantic web filtered database;

A personality analysis step of assigning a personality to the filtered semantic web page; And

And a classification analysis step of classifying the semantic web page to which the personality is given according to a percentage (%) of the personality.

A semantic web-based indexing method that converts collected web pages into semantic web pages and analyzes them at the word, paragraph, and article level to generate multiple indexes on one web page.

The method of claim 4, wherein the web page collection step,

A static web page collecting step of collecting static web pages operated by static rules in a certain source format such as newspapers, forums and editorials on the Internet,

A semantic web-based indexing method comprising dynamic web page collection steps for collecting dynamic web pages such as blogs and general web pages on the Internet.

The method of claim 4, wherein the semantic analysis step,

Extracting articles from the collected semantic web pages, setting a frequency, a relationship, and graphing the articles;

Extracting paragraphs from the collected semantic web pages, establishing a frequency, relationship, and graphing; And

Semantic web-based index method comprising the steps of extracting the word from the collected semantic web page, frequency setting, relationship processing.

The method of claim 6, wherein the filtering step

Refining data to be deleted from the analyzed semantic web page article;

Refining the data to be deleted in the paragraph of the analyzed semantic web page; And

And refining data to be deleted from the analyzed semantic web page word.