KR101044633B1 - Semantic Web-based Index Method and Search Engine Using the Same - Google Patents
Semantic Web-based Index Method and Search Engine Using the Same Download PDFInfo
- Publication number
- KR101044633B1 KR101044633B1 KR1020080095268A KR20080095268A KR101044633B1 KR 101044633 B1 KR101044633 B1 KR 101044633B1 KR 1020080095268 A KR1020080095268 A KR 1020080095268A KR 20080095268 A KR20080095268 A KR 20080095268A KR 101044633 B1 KR101044633 B1 KR 101044633B1
- Authority
- KR
- South Korea
- Prior art keywords
- semantic web
- semantic
- web page
- database
- agent
- Prior art date
Links
Images
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Computational Linguistics (AREA)
Abstract
The present invention relates to a web search technology, and more particularly, to a semantic web-based indexing method for constructing a search database using the semantic web, and a search engine using the same.
The index method of the present invention includes a web page collecting step of collecting web pages distributed on the Internet, treating them as semantic web pages, and storing them in a semantic web page database; A semantic analysis step of extracting the semantic web page stored in the semantic web page database into word, paragraph, and article levels, setting the frequency, relationship, and graphing for each level and storing the semantic web page in the semantic web analysis database; A filtering step of filtering semantic web pages stored in the semantic web analysis database into words, paragraphs, and article levels, and storing the semantic web pages in a semantic web filtered database; A personality analysis step of assigning a personality to the filtered semantic web page; And a classification analysis step of classifying the semantic web page to which the personality is given according to a percentage (%) of the personality. The collected web pages are converted into semantic web pages and analyzed at a word, paragraph, and article level. Create multiple indexes on a web page.
Index, Search Engine, Semantic Web, Filtering, Analysis
Description
The present invention relates to a web search technology, and more particularly, to a semantic web-based indexing method for constructing a search database using the semantic web, and a search engine using the same.
In general, conventional portal (search) sites such as Naver, Dreamwiz, Daum, Yahoo, etc., provide a database for classifying and storing web site information according to a predetermined criterion, and mechanically convert new web site information while continuously traversing the web. It consists of a search robot for collecting data and a search engine that makes the collected data into a database so that users who use portal (search) sites can search. Search and provide a list of sites similar to your keywords.
1 is a diagram showing the overall structure of a general search engine.
Referring to FIG. 1, an internet search engine is an information retrieval system that enables a search for a document existing on a web, and may be broadly divided into a data collection S1, an index S2, and a search S3. In the data collection (S1) section,
In the index S2, index information of the web document collected by the
In the search S3 part, whenever the searcher 17 inputs the desired information, the
These internet search engines are classified into a directory search engine, a keyword search engine, and a meta search engine according to a search method. A directory search engine is a search engine that classifies materials by subject or category, and builds a database by adding explanations and evaluations. The keyword search engine collects web documents by web document collection program and stores the collected documents in the search engine's database through the indexing process and searches the user's query words by keyword matching method. Since the meta search engine collects the search contents according to the query term of the searcher from other search engines and shows them to the searcher, the searcher can obtain various search results and display the result by combining the results of the query term in the existing search engine. The advantage is that it does not require space to store data internally.
The Semantic Web, on the other hand, gives well-defined meanings to information on the Web, allowing computers as well as humans to easily interpret the meaning of documents, thus automating tasks such as searching, interpreting, and integrating information using computers. It is proposed for the purpose of doing so.
Semantic Web documents have meanings that can be easily interpreted by computers, unlike existing web documents focused on natural language, so that automated agents or sophisticated search engines can use high meanings to achieve high levels of automation and intelligence. .
In the Semantic Web, resources are expressed in triple form of resources, attributes, and attribute values, and RDF (Resource Description Framework) is defined as a framework for resource representation. The semantic web uses SPARQL as a query language for retrieving resource information expressed in RDF, and is used as a protocol for transmitting queries in a client-server environment.
The general goal of information retrieval systems is to understand the user's intentions and documents from a large amount of stored information so that the user can accurately understand the user's intention and deliver the required documents to the user without being missed by efficient retrieval. .
However, the conventional search engine such as Google does not have any other indexing process between the gathering and the indexing process, ‘A, B, C, D…. Indexed in order, and there is one index on one web page.
Therefore, in the conventional search engine, the wrong web pages are searched corresponding to the keyword input by the user, making the search inconvenient, and there is a problem in that the keyword input of aberration is repeated until the desired information is obtained.
SUMMARY OF THE INVENTION The present invention has been proposed to solve the above problems, and an object of the present invention is to build indexing on a web page in various ways through the semantic web to perform a Meaning Search for keywords entered by a user. It is possible to provide a semantic web-based indexing method and a search engine using the same that enable the user to quickly search for information suitable for user intention.
The search engine of the present invention for achieving the above object is a gathering agent that collects web pages distributed on the Internet and processes them as semantic web pages and stores them in a semantic web page database, and the semantic web stored in the semantic web page database. The semantic analysis agent extracts the word, paragraph and article levels from the page, and sets the frequency, relationship and graph for each level, and stores them in the semantic web analysis database, and the semantic web page stored in the semantic web analysis database. A filtering agent that stores the semantic web filtered database after filtering by level, and a personality analysis agent that gives a personality to the filtered semantic web page, and the semantic web data to which the personality is granted. according to An indexing unit configured to classify the collected web pages into semantic web pages and classify them into words, paragraphs, and article levels to generate a plurality of indexes in one web page; An index database storing an index of each web page generated by the indexing unit; And a search agent for searching the index database according to a user's search term and processing a document search based on the semantic web.
In addition, the index method of the present invention for achieving the above object is a web page collection step of collecting the web pages distributed on the Internet to be processed as a semantic web page and stored in the semantic web page database; A semantic analysis step of extracting the semantic web page stored in the semantic web page database into word, paragraph, and article levels, setting the frequency, relationship, and graphing for each level and storing the semantic web page in the semantic web analysis database; A filtering step of filtering semantic web pages stored in the semantic web analysis database into words, paragraphs, and article levels, and storing the semantic web pages in the semantic web filtered database; A personality analysis step of assigning a personality to the filtered semantic web page; And a classification analysis step of classifying the semantic web page to which the personality is given according to a percentage (%) of the personality. The collected web pages are converted into semantic web pages and analyzed at a word, paragraph, and article level. And generating a plurality of indexes on the web page.
The web page collection step includes a static web page collection step of collecting static web pages operated by static rules while having a certain source format such as newspapers, forums and editorials on the Internet, and a blog or general web page on the Internet. It consists of a dynamic web page collection step of collecting a dynamic web page.
The semantic analysis step may include extracting articles from the collected semantic web pages, setting a frequency, relationship, and graphing; Extracting paragraphs from the collected semantic web pages, establishing a frequency, relationship, and graphing; And extracting words from the collected semantic web pages to set frequency, relationship, and graph processing.
The filtering step may include refining data to be deleted from an article of the analyzed semantic web page; Refining the data to be deleted in the paragraph of the analyzed semantic web page; And purifying the data to be deleted from the words of the analyzed semantic web page.
In the indexing method using the semantic web of the present invention, additional indexing exists between processes from gathering to indexing, and thus hundreds of indexing may exist in one web page. These indexings are word- and paragraph-oriented indexing.
Therefore, while a conventional search engine such as Google provides search results in a stored DB according to one indexing method, a search engine to which the present invention is applied means that there are hundreds of semantic web concepts that grasp the meaning of words in one web page. Meaning Search is possible.
The technical problems achieved by the present invention and the practice of the present invention will be more clearly understood by the preferred embodiments of the present invention described below. The following examples are merely illustrative of the present invention and are not intended to limit the scope of the present invention.
2 illustrates the overall structure of a semantic web based search engine according to the present invention.
As shown in FIG. 2, the semantic web-based search engine according to the present invention is implemented in a semantic web-based
Referring to FIG. 2, the search engine according to the present invention has a main solution group consisting of seven agents, and a Policy Agent (PA) 220 located above all agents performs specific functions to the agents. Responsible for requesting and directing policy functions.
Gathering Agnet (GA) 211,212 collects web pages, Filter Agent (FA) 213 refines data (changes it to an available form), and Analysis Agent (AA) 214 Analyze the collected data and stores the index in the
The search agent (SA) 204 processes ontology search and semantic web document search, and the monitoring agent (MA) 224 detects calculation errors of the
3 is a flowchart illustrating a procedure of indexing using the semantic web according to the present invention, and FIG. 4 is an example of a semantic web solution indexing using the semantic web according to the present invention.
Indexing process using the semantic web according to the present invention, as shown in Figure 3, the web page collection step (S301), semantic analysis step (S302), filtering step (S303), personalization step (S304), classification In step S305, the index DB for semantic web search is generated.
Web page collection step (S301) is a step of taking a web page from the Internet and processing the data of the unrefined dynamic, static web page as web data for semantics and stores it in the semantic web page database (402). To this end, the Gathering Agent for Semantic pages at Static Web pages (GA.S.ST) solution (401a) is a set of web pages operated by static rules that have a uniform source format such as newspapers, forums, and editorials. The web page is collected from the static Internet 102-1, and processed as data for the semantic page and stored in the semantic
The semantic analysis step S302 is performed in the web page collection step S301 and stores the semantic web pages stored in the semantic
The filtering step S303 is a step of lowering or refining the waste data to be deleted from the data stored in the semantic
The Semantic Paragraph Filtering Agent (FA.SP)
Characterization step (S304) is a step of giving a personality, such as economy, politics, culture, entertainment, etc. to each semantic web page of the stored semantic web filtered
The classification step (S305) is a step for grouping and extracting semantic web data to which the character is assigned and classifying the analysis data. The classification agent (AA.GS) Analysis Agent for Grouping at Semantic
The present invention has been described above with reference to one embodiment shown in the drawings, but those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom.
1 is a diagram showing the structure of a general search engine,
2 is a diagram illustrating the overall structure of a semantic web based search engine according to the present invention;
3 is a flowchart illustrating a procedure of indexing using the semantic web according to the present invention;
4 illustrates a detailed example of indexing using the semantic web in accordance with the present invention.
Claims (7)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR20070097332 | 2007-09-27 | ||
KR1020070097332 | 2007-09-27 |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20090033149A KR20090033149A (en) | 2009-04-01 |
KR101044633B1 true KR101044633B1 (en) | 2011-07-01 |
Family
ID=40759656
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020080095268A KR101044633B1 (en) | 2007-09-27 | 2008-09-29 | Semantic Web-based Index Method and Search Engine Using the Same |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR101044633B1 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102181896B1 (en) * | 2014-07-02 | 2020-11-23 | 삼성전자 주식회사 | A method and system for presenting content on an electronic device |
US10241994B2 (en) | 2014-07-02 | 2019-03-26 | Samsung Electronics Co., Ltd. | Electronic device and method for providing content on electronic device |
KR101589279B1 (en) * | 2014-08-29 | 2016-01-28 | 한국전자통신연구원 | Apparatus and method of classifying industrial control system webpage |
CN107193873A (en) * | 2017-04-17 | 2017-09-22 | 吉林工程技术师范学院 | A kind of network search method based on semantic network technology |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002251394A (en) | 2001-02-22 | 2002-09-06 | Nec Corp | Whole sentence retrieval system |
KR20060103165A (en) * | 2005-03-23 | 2006-09-28 | 조광현 | Classified web sites search system and method |
KR20060135173A (en) * | 2005-06-24 | 2006-12-29 | 동아시테크주식회사 | File management system |
KR100759186B1 (en) | 2006-05-29 | 2007-09-14 | 주식회사 케이티 | System and method to provide web service that delivers information from semi structured web document and database |
-
2008
- 2008-09-29 KR KR1020080095268A patent/KR101044633B1/en not_active IP Right Cessation
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002251394A (en) | 2001-02-22 | 2002-09-06 | Nec Corp | Whole sentence retrieval system |
KR20060103165A (en) * | 2005-03-23 | 2006-09-28 | 조광현 | Classified web sites search system and method |
KR20060135173A (en) * | 2005-06-24 | 2006-12-29 | 동아시테크주식회사 | File management system |
KR100759186B1 (en) | 2006-05-29 | 2007-09-14 | 주식회사 케이티 | System and method to provide web service that delivers information from semi structured web document and database |
Also Published As
Publication number | Publication date |
---|---|
KR20090033149A (en) | 2009-04-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Johnson et al. | Web content mining techniques: a survey | |
JP4644420B2 (en) | Method and machine-readable storage device for retrieving and presenting data over a network | |
US8473473B2 (en) | Object oriented data and metadata based search | |
US8965894B2 (en) | Automated web page classification | |
US20150032728A1 (en) | System and method of generating a set of search results | |
Stuckenschmidt et al. | Exploring large document repositories with RDF technology: The DOPE project | |
Sharma et al. | The anatomy of web crawlers | |
KR101224800B1 (en) | Crawling database for infomation | |
KR100800460B1 (en) | System and method for retrieving/classifying web ontology | |
KR101044633B1 (en) | Semantic Web-based Index Method and Search Engine Using the Same | |
López et al. | An efficient and scalable search engine for models | |
Saini et al. | Review on web content mining techniques | |
KR101038337B1 (en) | Ontology based index method and search engine using the same | |
KR102107474B1 (en) | Social issue deduction system and method using crawling | |
Maciołek et al. | Cluo: Web-scale text mining system for open source intelligence purposes | |
Broughton | Facet analytical theory as a basis for a knowledge organization tool in a subject portal | |
EP3535661A2 (en) | A system for managing, analyzing, navigating or searching of data information across one or more sources within a computer or a computer network, without copying, moving or manipulating the source or the data information stored in the source | |
KR101476225B1 (en) | Method for Indexing Natural Language And Mathematical Formula, Apparatus And Computer-Readable Recording Medium with Program Therefor | |
KR101665649B1 (en) | System for analyzing social media data and method for analyzing social media data using the same | |
EP2411930A2 (en) | A system for automatic semantic-based mining | |
WO2009030248A1 (en) | Detecting correlations between data representing information | |
Huynh et al. | Integrating bibliographical data of computer science publications from online digital libraries | |
Shekhar et al. | A WEBIR crawling framework for retrieving highly relevant web documents: evaluation based on rank aggregation and result merging algorithms | |
Saranya et al. | A Study on Competent Crawling Algorithm (CCA) for Web Search to Enhance Efficiency of Information Retrieval | |
Dhingra et al. | Semcrawl: framework for crawling ontology annotated web documents for intelligent information retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E701 | Decision to grant or registration of patent right | ||
GRNT | Written decision to grant | ||
LAPS | Lapse due to unpaid annual fee |