KR101044633B1 - Semantic Web-based Index Method and Search Engine Using the Same - Google Patents

Semantic Web-based Index Method and Search Engine Using the Same Download PDF

Info

Publication number
KR101044633B1
KR101044633B1 KR1020080095268A KR20080095268A KR101044633B1 KR 101044633 B1 KR101044633 B1 KR 101044633B1 KR 1020080095268 A KR1020080095268 A KR 1020080095268A KR 20080095268 A KR20080095268 A KR 20080095268A KR 101044633 B1 KR101044633 B1 KR 101044633B1
Authority
KR
South Korea
Prior art keywords
semantic web
semantic
web page
database
agent
Prior art date
Application number
KR1020080095268A
Other languages
Korean (ko)
Other versions
KR20090033149A (en
Inventor
조광현
Original Assignee
조광현
주식회사 시맨틱스
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 조광현, 주식회사 시맨틱스 filed Critical 조광현
Publication of KR20090033149A publication Critical patent/KR20090033149A/en
Application granted granted Critical
Publication of KR101044633B1 publication Critical patent/KR101044633B1/en

Links

Images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computational Linguistics (AREA)

Abstract

The present invention relates to a web search technology, and more particularly, to a semantic web-based indexing method for constructing a search database using the semantic web, and a search engine using the same.

The index method of the present invention includes a web page collecting step of collecting web pages distributed on the Internet, treating them as semantic web pages, and storing them in a semantic web page database; A semantic analysis step of extracting the semantic web page stored in the semantic web page database into word, paragraph, and article levels, setting the frequency, relationship, and graphing for each level and storing the semantic web page in the semantic web analysis database; A filtering step of filtering semantic web pages stored in the semantic web analysis database into words, paragraphs, and article levels, and storing the semantic web pages in a semantic web filtered database; A personality analysis step of assigning a personality to the filtered semantic web page; And a classification analysis step of classifying the semantic web page to which the personality is given according to a percentage (%) of the personality. The collected web pages are converted into semantic web pages and analyzed at a word, paragraph, and article level. Create multiple indexes on a web page.

Figure R1020080095268

Index, Search Engine, Semantic Web, Filtering, Analysis

Description

Semantic web-based indexing method and search engine using the same {SEMANTIC WEB BASED INDEX METHOD AND SEARCH ENGINE USING THE SAME}

The present invention relates to a web search technology, and more particularly, to a semantic web-based indexing method for constructing a search database using the semantic web, and a search engine using the same.

In general, conventional portal (search) sites such as Naver, Dreamwiz, Daum, Yahoo, etc., provide a database for classifying and storing web site information according to a predetermined criterion, and mechanically convert new web site information while continuously traversing the web. It consists of a search robot for collecting data and a search engine that makes the collected data into a database so that users who use portal (search) sites can search. Search and provide a list of sites similar to your keywords.

1 is a diagram showing the overall structure of a general search engine.

Referring to FIG. 1, an internet search engine is an information retrieval system that enables a search for a document existing on a web, and may be broadly divided into a data collection S1, an index S2, and a search S3. In the data collection (S1) section, document collection programs 12 called spiders and crawlers are stored on computers around the world connected to the World Wide Web network 11 based on the link information. The web document is collected and stored in the database 13.

In the index S2, index information of the web document collected by the index module 14 is stored in the index database 16 in order to speed up the search and reduce the amount of data to be stored.

In the search S3 part, whenever the searcher 17 inputs the desired information, the search engine 18 searches the index information stored in the index database 16, and the ranking system 20 searches the search results. The ranking of the search results is provided to the searcher 17 according to the ranking. At this time, the search engine 18 controls the spider program 12 through the spider control 19 to increase the performance of the search, and the index module 14 and the analysis module 15 analyze the collected web documents for indexing. Process.

These internet search engines are classified into a directory search engine, a keyword search engine, and a meta search engine according to a search method. A directory search engine is a search engine that classifies materials by subject or category, and builds a database by adding explanations and evaluations. The keyword search engine collects web documents by web document collection program and stores the collected documents in the search engine's database through the indexing process and searches the user's query words by keyword matching method. Since the meta search engine collects the search contents according to the query term of the searcher from other search engines and shows them to the searcher, the searcher can obtain various search results and display the result by combining the results of the query term in the existing search engine. The advantage is that it does not require space to store data internally.

The Semantic Web, on the other hand, gives well-defined meanings to information on the Web, allowing computers as well as humans to easily interpret the meaning of documents, thus automating tasks such as searching, interpreting, and integrating information using computers. It is proposed for the purpose of doing so.

Semantic Web documents have meanings that can be easily interpreted by computers, unlike existing web documents focused on natural language, so that automated agents or sophisticated search engines can use high meanings to achieve high levels of automation and intelligence. .

In the Semantic Web, resources are expressed in triple form of resources, attributes, and attribute values, and RDF (Resource Description Framework) is defined as a framework for resource representation. The semantic web uses SPARQL as a query language for retrieving resource information expressed in RDF, and is used as a protocol for transmitting queries in a client-server environment.

The general goal of information retrieval systems is to understand the user's intentions and documents from a large amount of stored information so that the user can accurately understand the user's intention and deliver the required documents to the user without being missed by efficient retrieval. .

However, the conventional search engine such as Google does not have any other indexing process between the gathering and the indexing process, ‘A, B, C, D…. Indexed in order, and there is one index on one web page.

Therefore, in the conventional search engine, the wrong web pages are searched corresponding to the keyword input by the user, making the search inconvenient, and there is a problem in that the keyword input of aberration is repeated until the desired information is obtained.

SUMMARY OF THE INVENTION The present invention has been proposed to solve the above problems, and an object of the present invention is to build indexing on a web page in various ways through the semantic web to perform a Meaning Search for keywords entered by a user. It is possible to provide a semantic web-based indexing method and a search engine using the same that enable the user to quickly search for information suitable for user intention.

The search engine of the present invention for achieving the above object is a gathering agent that collects web pages distributed on the Internet and processes them as semantic web pages and stores them in a semantic web page database, and the semantic web stored in the semantic web page database. The semantic analysis agent extracts the word, paragraph and article levels from the page, and sets the frequency, relationship and graph for each level, and stores them in the semantic web analysis database, and the semantic web page stored in the semantic web analysis database. A filtering agent that stores the semantic web filtered database after filtering by level, and a personality analysis agent that gives a personality to the filtered semantic web page, and the semantic web data to which the personality is granted. according to An indexing unit configured to classify the collected web pages into semantic web pages and classify them into words, paragraphs, and article levels to generate a plurality of indexes in one web page; An index database storing an index of each web page generated by the indexing unit; And a search agent for searching the index database according to a user's search term and processing a document search based on the semantic web.

In addition, the index method of the present invention for achieving the above object is a web page collection step of collecting the web pages distributed on the Internet to be processed as a semantic web page and stored in the semantic web page database; A semantic analysis step of extracting the semantic web page stored in the semantic web page database into word, paragraph, and article levels, setting the frequency, relationship, and graphing for each level and storing the semantic web page in the semantic web analysis database; A filtering step of filtering semantic web pages stored in the semantic web analysis database into words, paragraphs, and article levels, and storing the semantic web pages in the semantic web filtered database; A personality analysis step of assigning a personality to the filtered semantic web page; And a classification analysis step of classifying the semantic web page to which the personality is given according to a percentage (%) of the personality. The collected web pages are converted into semantic web pages and analyzed at a word, paragraph, and article level. And generating a plurality of indexes on the web page.

The web page collection step includes a static web page collection step of collecting static web pages operated by static rules while having a certain source format such as newspapers, forums and editorials on the Internet, and a blog or general web page on the Internet. It consists of a dynamic web page collection step of collecting a dynamic web page.

The semantic analysis step may include extracting articles from the collected semantic web pages, setting a frequency, relationship, and graphing; Extracting paragraphs from the collected semantic web pages, establishing a frequency, relationship, and graphing; And extracting words from the collected semantic web pages to set frequency, relationship, and graph processing.

The filtering step may include refining data to be deleted from an article of the analyzed semantic web page; Refining the data to be deleted in the paragraph of the analyzed semantic web page; And purifying the data to be deleted from the words of the analyzed semantic web page.

In the indexing method using the semantic web of the present invention, additional indexing exists between processes from gathering to indexing, and thus hundreds of indexing may exist in one web page. These indexings are word- and paragraph-oriented indexing.

Therefore, while a conventional search engine such as Google provides search results in a stored DB according to one indexing method, a search engine to which the present invention is applied means that there are hundreds of semantic web concepts that grasp the meaning of words in one web page. Meaning Search is possible.

The technical problems achieved by the present invention and the practice of the present invention will be more clearly understood by the preferred embodiments of the present invention described below. The following examples are merely illustrative of the present invention and are not intended to limit the scope of the present invention.

2 illustrates the overall structure of a semantic web based search engine according to the present invention.

As shown in FIG. 2, the semantic web-based search engine according to the present invention is implemented in a semantic web-based search site 200 that can be accessed by a plurality of users 110 through the Internet 102. The semantic web based search site 200 collects web pages from a client interface 202, a search agent (SA) 204, an index database 206, a static internet 102-1 or a dynamic internet 102-2. It is composed of an indexing unit 210, a policy agent (PA: 220), a doctor agent (DA; 222), and a monitoring agent (MA; 224) to index after analysis, and the indexing unit 210 includes a static web page gathering agent ( And a dynamic web page gathering agent (GA; 212), a filtering agent (FA; 213), and an analysis agent (AA; 214).

Referring to FIG. 2, the search engine according to the present invention has a main solution group consisting of seven agents, and a Policy Agent (PA) 220 located above all agents performs specific functions to the agents. Responsible for requesting and directing policy functions.

Gathering Agnet (GA) 211,212 collects web pages, Filter Agent (FA) 213 refines data (changes it to an available form), and Analysis Agent (AA) 214 Analyze the collected data and stores the index in the index database 206. At this time, the static web page gathering agent 211 collects web pages from data of the web page 102-1 operated by static rules while having a certain source format such as newspaper, forum, and editorial, and dynamic web page gathering agent. 212 collects semantic web pages from dynamic internet 102-2, such as blogs, generic web pages.

The search agent (SA) 204 processes ontology search and semantic web document search, and the monitoring agent (MA) 224 detects calculation errors of the indexing unit 210 or monitors the corrected data. Is a tool for delivering to the policy agent 220, and the doctor agent (DA) 222 is responsible for checking the update of the indexing unit 210 and treating errors at the request of the policy agent 220. .

3 is a flowchart illustrating a procedure of indexing using the semantic web according to the present invention, and FIG. 4 is an example of a semantic web solution indexing using the semantic web according to the present invention.

Indexing process using the semantic web according to the present invention, as shown in Figure 3, the web page collection step (S301), semantic analysis step (S302), filtering step (S303), personalization step (S304), classification In step S305, the index DB for semantic web search is generated.

Web page collection step (S301) is a step of taking a web page from the Internet and processing the data of the unrefined dynamic, static web page as web data for semantics and stores it in the semantic web page database (402). To this end, the Gathering Agent for Semantic pages at Static Web pages (GA.S.ST) solution (401a) is a set of web pages operated by static rules that have a uniform source format such as newspapers, forums, and editorials. The web page is collected from the static Internet 102-1, and processed as data for the semantic page and stored in the semantic web page database 402. The Gathering Agent for Semantic pages at Dynamic Web pages (GA.SD) solution (401b) collects web pages from the dynamic Internet 102-2, which consists of dynamic web pages such as blogs and regular web pages. It is processed as data for semantic pages and stored in the semantic web page database 402.

The semantic analysis step S302 is performed in the web page collection step S301 and stores the semantic web pages stored in the semantic web page database 402 as Article (403a), Paragraph (403b), and Word (404). Step to process frequency and relationship analysis data. For this Semantic Article Analysis Agent (AA.SA (Analysis Agent for Semantic Article) The solution 404a extracts the article 403a of each web page from the collected semantic web page database 402, sets the frequency, relationship, and graphs the article analysis data 405a. Stored in the analysis database 405. The Analysis Agent for Semantic Paragraph (AA.SP) solution (404b) extracts paragraphs (403b) of each web page from the collected semantic web page database (402) to establish a frequency, relationship, graph, and paragraphs. The analysis data 405b is stored in the semantic web analytics database 406. The Semantic Word Analysis Agent (AA.SW) solution 404c extracts the words 403c of each web page from the collected semantic web page database 402 to establish a frequency, relationship, and graph for words. The analysis data 405c is stored in the semantic web analytics database. The analysis data divided and analyzed by level are integrated into one and stored in the semantic web analysis database 406.

The filtering step S303 is a step of lowering or refining the waste data to be deleted from the data stored in the semantic web analysis database 406. The filter agent for semantic article (FA.SA) solution 407a refines the data to be deleted from the article analysis data 405a of the stored semantic web database 406 to filter the article filtered data 408a. Stored in the Semantic Web Filtered Database 409.

The Semantic Paragraph Filtering Agent (FA.SP) solution 407b refines the data to be deleted from the paragraph analysis data 405b of the semantic web analytics database 406 to filter the filtered paragraph filtered data 408b. Stored in the Semantic Web Filtered Database 409. The Semantic Word Filtering Agent (FA.SW) solution 407c refines the data to be deleted from the word analysis data 405c of the semantic web analysis database 406 to filter the filtered word filtered data 408c. Stored in the Semantic Web Filtered Database 409. The filtered article filtered data 408a, the paragraph filtered data 409, and the word filtered data 408c are stored in the semantic web filtered database 409.

Characterization step (S304) is a step of giving a personality, such as economy, politics, culture, entertainment, etc. to each semantic web page of the stored semantic web filtered database 409, a personality analysis agent (AA.CS: Analysis Agent for Character at Semantic Web pages) solution 410 assigns personality to semantic web filtered data 409 and stores personality analysis data 411 in semantic web personality database 412.

The classification step (S305) is a step for grouping and extracting semantic web data to which the character is assigned and classifying the analysis data. The classification agent (AA.GS) Analysis Agent for Grouping at Semantic Web pages solution 413 The semantic web personality database 412 classifies semantic web data given personality according to the percentage of personality and stores the classified analysis data 414 in the semantic web classification database 415 to store the indexed database. Create

The present invention has been described above with reference to one embodiment shown in the drawings, but those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom.

1 is a diagram showing the structure of a general search engine,

2 is a diagram illustrating the overall structure of a semantic web based search engine according to the present invention;

3 is a flowchart illustrating a procedure of indexing using the semantic web according to the present invention;

4 illustrates a detailed example of indexing using the semantic web in accordance with the present invention.

Claims (7)

A gathering agent that collects web pages distributed on the Internet, processes them into semantic web pages, and stores them in a semantic web page database, and extracts word, paragraph, and article levels from semantic web pages stored in the semantic web page database at each level. Semantic analysis agent that stores frequency and relationship and graphs and saves in semantic web analysis database, and semantic web page stored in semantic web analysis database is classified into word, paragraph and article level, and stored in semantic web filtered database. And a filtering agent configured to provide a personality to the filtered semantic web page, and a classification analysis agent to classify the semantic web page to which the personality is assigned according to a percentage (%) of the personality. After the conversion to the semantic Web page words and paragraphs, and analyzed by article level indexing for generating a plurality of indexes for a web page; An index database storing an index of each web page generated by the indexing unit; And A semantic web-based search engine including a search agent for searching the index database according to a user's search term and processing a document search based on the semantic web. The method of claim 1, wherein the semantic web based search engine A policy agent which is located above the agents belonging to the indexing unit and the search agent and is responsible for a policy function for requesting and directing specific agents to perform a specific function; A monitoring agent that monitors and transmits calculation errors found or corrected in the indexing unit to the policy agent; The semantic web-based search engine further comprises a doctor agent responsible for checking an update of the indexing unit and treating an error according to a request of the policy agent. The method of claim 1, wherein the gathering agent A static web page gathering agent that collects static web pages on the Internet, run by static rules, in a certain source format such as newspapers, forums, editorials, A semantic web-based search engine that consists of a dynamic web page gathering agent that collects dynamic web pages such as blogs and general web pages on the Internet. A web page collecting step of collecting web pages distributed on the Internet, treating the semantic web pages, and storing the semantic web pages in a semantic web page database; A semantic analysis step of extracting the semantic web page stored in the semantic web page database into word, paragraph, and article levels, setting the frequency, relationship, and graphing for each level and storing the semantic web page in the semantic web analysis database; A filtering step of filtering semantic web pages stored in the semantic web analysis database into words, paragraphs, and article levels, and storing the semantic web pages in a semantic web filtered database; A personality analysis step of assigning a personality to the filtered semantic web page; And And a classification analysis step of classifying the semantic web page to which the personality is given according to a percentage (%) of the personality. A semantic web-based indexing method that converts collected web pages into semantic web pages and analyzes them at the word, paragraph, and article level to generate multiple indexes on one web page. The method of claim 4, wherein the web page collection step, A static web page collecting step of collecting static web pages operated by static rules in a certain source format such as newspapers, forums and editorials on the Internet, A semantic web-based indexing method comprising dynamic web page collection steps for collecting dynamic web pages such as blogs and general web pages on the Internet. The method of claim 4, wherein the semantic analysis step, Extracting articles from the collected semantic web pages, setting a frequency, a relationship, and graphing the articles; Extracting paragraphs from the collected semantic web pages, establishing a frequency, relationship, and graphing; And Semantic web-based index method comprising the steps of extracting the word from the collected semantic web page, frequency setting, relationship processing. The method of claim 6, wherein the filtering step Refining data to be deleted from the analyzed semantic web page article; Refining the data to be deleted in the paragraph of the analyzed semantic web page; And And refining data to be deleted from the analyzed semantic web page word.
KR1020080095268A 2007-09-27 2008-09-29 Semantic Web-based Index Method and Search Engine Using the Same KR101044633B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR20070097332 2007-09-27
KR1020070097332 2007-09-27

Publications (2)

Publication Number Publication Date
KR20090033149A KR20090033149A (en) 2009-04-01
KR101044633B1 true KR101044633B1 (en) 2011-07-01

Family

ID=40759656

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020080095268A KR101044633B1 (en) 2007-09-27 2008-09-29 Semantic Web-based Index Method and Search Engine Using the Same

Country Status (1)

Country Link
KR (1) KR101044633B1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102181896B1 (en) * 2014-07-02 2020-11-23 삼성전자 주식회사 A method and system for presenting content on an electronic device
US10241994B2 (en) 2014-07-02 2019-03-26 Samsung Electronics Co., Ltd. Electronic device and method for providing content on electronic device
KR101589279B1 (en) * 2014-08-29 2016-01-28 한국전자통신연구원 Apparatus and method of classifying industrial control system webpage
CN107193873A (en) * 2017-04-17 2017-09-22 吉林工程技术师范学院 A kind of network search method based on semantic network technology

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002251394A (en) 2001-02-22 2002-09-06 Nec Corp Whole sentence retrieval system
KR20060103165A (en) * 2005-03-23 2006-09-28 조광현 Classified web sites search system and method
KR20060135173A (en) * 2005-06-24 2006-12-29 동아시테크주식회사 File management system
KR100759186B1 (en) 2006-05-29 2007-09-14 주식회사 케이티 System and method to provide web service that delivers information from semi structured web document and database

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002251394A (en) 2001-02-22 2002-09-06 Nec Corp Whole sentence retrieval system
KR20060103165A (en) * 2005-03-23 2006-09-28 조광현 Classified web sites search system and method
KR20060135173A (en) * 2005-06-24 2006-12-29 동아시테크주식회사 File management system
KR100759186B1 (en) 2006-05-29 2007-09-14 주식회사 케이티 System and method to provide web service that delivers information from semi structured web document and database

Also Published As

Publication number Publication date
KR20090033149A (en) 2009-04-01

Similar Documents

Publication Publication Date Title
Johnson et al. Web content mining techniques: a survey
JP4644420B2 (en) Method and machine-readable storage device for retrieving and presenting data over a network
US8473473B2 (en) Object oriented data and metadata based search
US8965894B2 (en) Automated web page classification
US20150032728A1 (en) System and method of generating a set of search results
Stuckenschmidt et al. Exploring large document repositories with RDF technology: The DOPE project
Sharma et al. The anatomy of web crawlers
KR101224800B1 (en) Crawling database for infomation
KR100800460B1 (en) System and method for retrieving/classifying web ontology
KR101044633B1 (en) Semantic Web-based Index Method and Search Engine Using the Same
López et al. An efficient and scalable search engine for models
Saini et al. Review on web content mining techniques
KR101038337B1 (en) Ontology based index method and search engine using the same
KR102107474B1 (en) Social issue deduction system and method using crawling
Maciołek et al. Cluo: Web-scale text mining system for open source intelligence purposes
Broughton Facet analytical theory as a basis for a knowledge organization tool in a subject portal
EP3535661A2 (en) A system for managing, analyzing, navigating or searching of data information across one or more sources within a computer or a computer network, without copying, moving or manipulating the source or the data information stored in the source
KR101476225B1 (en) Method for Indexing Natural Language And Mathematical Formula, Apparatus And Computer-Readable Recording Medium with Program Therefor
KR101665649B1 (en) System for analyzing social media data and method for analyzing social media data using the same
EP2411930A2 (en) A system for automatic semantic-based mining
WO2009030248A1 (en) Detecting correlations between data representing information
Huynh et al. Integrating bibliographical data of computer science publications from online digital libraries
Shekhar et al. A WEBIR crawling framework for retrieving highly relevant web documents: evaluation based on rank aggregation and result merging algorithms
Saranya et al. A Study on Competent Crawling Algorithm (CCA) for Web Search to Enhance Efficiency of Information Retrieval
Dhingra et al. Semcrawl: framework for crawling ontology annotated web documents for intelligent information retrieval

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant
LAPS Lapse due to unpaid annual fee