WO2013157712A1

WO2013157712A1 - Information search device, information search method, and computer-readable recording medium

Info

Publication number: WO2013157712A1
Application number: PCT/KR2012/009982
Authority: WO
Inventors: 박석일
Original assignee: Park Suk-Il
Priority date: 2012-04-17
Filing date: 2012-11-23
Publication date: 2013-10-24
Also published as: KR20130117126A; KR101347123B1

Abstract

The present invention relates to an information search device, an information search method, and a computer-readable recording medium. The information search device, according to one embodiment of the present invention, comprises: an interface unit for providing, to a user terminal device, information on a search window including a search entity, a directory, a record, and an intention field; a storage unit including a database having content information classified by being indexed into a plurality of search categories recorded therein; and a control unit for sequentially searching for the plurality of search categories by using search words inputted according to each field, when the search words are inputted according to each field on the search window.

Description

Information retrieval apparatus and information retrieval method, computer readable recording medium

The present invention relates to an information retrieval apparatus, an information retrieval method, and a computer readable recording medium. More specifically, the present invention relates to information retrieval, for example, by grasping the intention of an information user, and by grasping the context of a keyword keyword entered by the information user. An information retrieval apparatus, an information retrieval method, and a computer readable recording medium capable of providing information corresponding to an intention and a context.

Search engine technology, which is generally well known, is based on keyword search terms, and is a method of analyzing and indexing words in web data and providing information data that matches keyword search terms of information users. Search engines are software that makes it easy to find data on the Internet.

However, such a search technique analyzes specific web data centered on words and the frequency of the words, so that the meaning of the words is not understood in the overall context of the web page. There is a problem that does not accurately grasp the entire context between search terms.

As a result, the conventional retrieval technique provides a large number of information irrespective of the intention of the information user or the context of the web data, thereby causing the information user to re-examine the necessary information among the information.

In addition, word-analytic keyword search engines analyze, store, and index all words on a given web page, resulting in the repetitive storage of single web data information. The maintenance required a cost.

Embodiments of the present invention provide an information retrieval apparatus, an information retrieval method, and a computer readable recording medium capable of providing information by accurately grasping the searcher's intention or the entire context between the search terms in information user search word analysis. have.

An information retrieval apparatus according to an embodiment of the present invention is an interface unit for providing information of a search window including a search entity, a directory, a record, and an intention field to a user terminal device, and is classified and indexed into a plurality of search categories. A storage unit including a database in which the stored content information is recorded, and a control unit for sequentially searching for the plurality of search categories using the search word input for each field when the search word is input for each field in the search window. It is characterized by including.

The control unit determines a directory level to which the directory search term entered in the directory field belongs and a record level to which the record search term input in the record field belongs, respectively, and a plurality of search categories based on the determined directory level and record level. Search for content including a search term entered in the search entity field within the list, and limit the directory level and the record level to lower levels by using the searched content, and within the range of the searched content, the limited directory level. And re-search for the content using the record level.

The controller determines the intention of the user by analyzing the searched content, and limits the directory level and the record level to lower levels by using the user's intention, and sets the limited directory level and the record level. And the searched content is filtered for the plurality of search categories.

The control unit may define the level of each field by using each search word input in the search entity, directory, record, and intention field, and determine the intention of the user according to the prescribed level.

The controller may control the interface unit to stop re-searching the content and to provide a search result screen including the searched content to the user terminal device when the searched content includes content corresponding to a user's intention. do.

The control unit may stop re-searching the content when the searched content is within a preset number, and control the interface unit to provide a search result screen including the searched content to the user terminal device.

The plurality of search categories may include keywords, users, directories, records, producers, containers, foxsons, and networks.

In an information retrieval method according to an embodiment of the present invention, providing information of a search window including a search entity, a directory, a record, and an intention field, and if a search word is input for each field in the search window, Determining a directory level to which the directory search term entered in the directory field belongs and a record level to which the record search term entered in the record field belongs, respectively, within a plurality of preset search categories based on the determined directory level and record level; At step of sequentially searching for content including a search term input in the search entity field, and limiting the directory level and the record level to lower levels by using the searched content, respectively, within a range of the searched content, Content can be obtained using the limited directory level and record level. And rescanning.

Re-searching the content may include analyzing the searched content to determine an intention of the user; Limiting the directory level and the record level to lower levels respectively using the intention of the user; And filtering the searched content for the plurality of search categories using the limited directory level and the record level.

The determining of the intention of the user may include defining the level of each field using each search word input in the search entity, directory, record, and intention field, and determining the intention of the user according to the prescribed level. It is done.

The information retrieval method may further include stopping content re-search if the searched content includes content corresponding to the intention of the user, and providing a search result screen including the searched content.

The information retrieval method may further include stopping the re-search of the content when the searched content is within a predetermined number and providing a search result screen including the searched content.

In addition, the computer-readable recording medium according to an embodiment of the present invention is a computer-readable recording medium that stores a program for executing the information retrieval method, the information retrieval method is a search entity, directory, record, intention field providing information of a search window including a field; when a search word is input for each field in the search window, a directory to which a directory search word entered in the directory field belongs, and a record belonging to a record search word input in the record field Determining a level as a top level, searching for content including a search word input in the search entity field within a plurality of search categories based on the determined directory level and record level, and using the searched content Levels and record levels to lower levels, respectively And, in the range of the searched contents, using the defined directory level and a record level, it characterized in that it comprises the step of re-search content.

The embodiment of the present invention can accurately grasp the intention of the information user, grasp the context of the keyword keyword entered by him, and provide accurate information corresponding to the intention and context of the information user. In addition, since only information data corresponding to the intention of the information user can be provided, data traffic in the communication network 110 or the like may be reduced.

In addition, embodiments of the present invention can significantly reduce data storage. The huge power savings that come from the data centers of traditional search engines. Today, data centers represent the world's leading industrial unit of energy consumption and CO2 emissions. In particular, the existing keyword search engine classifies, subdivides, and reclassifies the web page body several times for word analysis. Thus, a single web page is copied and stored several times. Indexing, storing, and servicing such data increases the amount of electricity and requires more computer equipment as the amount of data increases, and according to the embodiment of the present invention, information data storage management can be innovatively reduced. In addition to search engines, the company will be able to radically reduce carbon emissions, which is a key topic in the web services industry.

1 is a view showing the structure of an information retrieval system according to an embodiment of the present invention;

2 is a flow chart briefly showing the function of the information retrieval apparatus of FIG.

3 is a diagram schematically illustrating a web world subject and a web world category,

4 is a diagram illustrating a category arrangement and interdependencies;

5 is a diagram illustrating a category arrangement for each search engine search;

6 is a diagram showing search term fields separated from a search window;

7 is a block diagram showing the structure of the information retrieval apparatus of FIG.

8 is a diagram illustrating an HTML source code of a main page;

9 is a view for explaining index alignment of the index processing unit of FIG. 7, and

10 is a flowchart illustrating an information retrieval method according to an embodiment of the present invention.

-

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

1 is a diagram showing the structure of an information retrieval system according to an embodiment of the present invention, FIG. 2 is a flow chart briefly showing the functions of the information retrieval apparatus of FIG. 1, and FIG. 3 is a diagram illustrating a web world subject and a web world category. The figure shown. FIG. 4 is a diagram illustrating a category arrangement and interdependencies, FIG. 5 is a diagram illustrating a category arrangement for each search engine search, and FIG. 6 is a diagram showing search term fields separately in a search window.

As shown in FIG. 1, an information retrieval system according to an exemplary embodiment of the present invention may partially or entirely include a terminal device 100, a communication network 110, web servers 120_1 and 120_2) and an information retrieval device 130. Include. In order to fully understand the invention, it will be described as including all.

Here, the user terminal device 100 may be applied to various wired / wireless environments, and may include a personal digital assistant (PDA), a cellular phone, a smart phone, and the like, and a PCS (Personal Communication Service) phone, GSM ( This includes all Global System for Mobile (WD) phones, wideband CDMA (W-CDMA) phones, CDMA-2000 phones, and Mobile Broadband System (MBS) phones. Here MBS phone represents a terminal to be used in the next generation system currently being discussed. Furthermore, the user terminal device 100 according to the embodiment of the present invention may further include a desktop computer, a laptop computer, and the like.

The user terminal device 100 may include a wireless communication module and a wireless LAN module, and further include a GPS module. As a wireless communication module is provided, the user terminal device 100 accesses a wired / wireless communication network to perform normal voice call and data communication with the other party. In addition, since the user terminal device 100 includes a wireless LAN module, the user terminal device 100 may receive various web page data by accessing the communication network 110 through an access point (AP) recognized in the vicinity. Furthermore, the user terminal device 100 may be classified into a GPS terminal and a non-GPS terminal according to whether a GPS module is provided. When the user terminal device is provided with a GPS module, the user terminal 100 receives data provided through a GPS satellite.

The user terminal device 100 may include a wireless application protocol (WAP), an Internet access protocol, a Microsoft Internet Explorer (MIE) based on HTML using an HTTP protocol, and a handheld device transport protocol (HDPT). To connect to the Internet via the communication network 110, using NTT DoKoMo's i-Mode or a specific communication company's wireless Internet browser. Among the Internet access protocols used by the user terminal device 100, MIE uses m-HTML, which is shortened by slightly modifying HTML, and in the case of i-Mode, a language called compact HTML (c-HTML), which is a subset of HTML. Use

Recently, a user terminal device 100 such as a smart phone 100 uses a browser for a wireless Internet access of a specific telecommunication company such as Opera Mini for the iPhone to provide a faster wireless Internet, or in conjunction with the user terminal device ( 100) also uses Wi-Fi and WiBro, which are local area networks, to provide wireless high-speed Internet.

The communication network 110 includes both wired and wireless communication networks. Here, the wired network includes an internet network such as a cable network or a public telephone network (PSTN), and the wireless communication network includes a CDMA, WCDMA, GSM, Evolved Packet Core (EPC), Long Term Evolution (LTE), WiBro network, and the like. to be. Therefore, when the communication network 110 is a wired communication network, the AP forming the local area network may be connected to an exchange office of a telephone company, but in the case of a wireless communication network, the AP may be connected to an SGSN or a Gateway GPRS Support Node (GGSN) operated by a communication company to process data. Data can be processed by connecting to various repeaters such as BTS (Base Station Transmission), NodeB, and e-NodeB.

The web servers 120_1 and 120_2 refer to all servers that provide information on the web. In other words, not only servers with search engines specialized in information retrieval, such as Naver, Google, or Yahoo, but also servers operated by general companies or individuals who do not specialize in information retrieval. It can be said to include all. In this case, the web servers 120_1 and 120_2 are a general search method, where a user searches directly by entering a keyword, that is, a search word, and a category search that narrows the scope by selecting a desired item among several items suggested by the search engine. This may be possible.

The information retrieval apparatus 130 according to the embodiment of the present invention may use a search method compared to the above web servers 120_1 and 120_2. Natural language processing methodology includes a DB (130a) for storing the data generated based on the principles of object-oriented programming theory, can provide a website for information retrieval. When the user terminal apparatus 100 accesses the information retrieval apparatus 130, the information retrieval apparatus 130 receives a plurality of search keywords input by the user in a plurality of fields from the user terminal apparatus 100 and uses the search keywords. The DB 130a may perform a search that matches the intention of the user. The search result corresponding to the user's intention as a search result is provided to the user terminal device 100.

Specifically, the information retrieval apparatus 130 according to an embodiment of the present invention collects and indexes information data (or web data) in a web site, and searches for a search word input by an information user in a search box. Receiving, analyzing a search word to find the necessary information from the index can be performed. In this case, the indexed information may include, for example, a keyword (K), a user (U), a directory (D: Directory), a record (R: Record), a producer (P: Producer), a container (C: Container), and a folksonomi. It consists of eight key categories (F: Folksonomy) and N (Network). Among the eight categories, the category entered by the information user in the search box is the entity (S), which means the key keyword (K), Intention belongs to the directory (D), record (R), and user (U) categories, and the search engine analyzes these four categories to identify the information users' intentions and the context of the search terms. Using the other categories of server information in each category, the grammar of the information user search and the intention of the information user are finally analyzed.

Here, the web data is web information having individual URLs in the web world, the information user means information, that is, a person using the web information, and the search term keyword is a search word input in a search box to search for information. The field is divided into fields with each other. In addition, the web page directory is a taxonnomy category in the web world, which classifies and categorizes information lines and subordinates into specific criteria. The web page record represents the data structure of the web page. The person or organization that produced the web page with a unique URL. In addition, the container means the web site to which the web page belongs in the web world, the foxsonomi means the web grammar created by the web public resulting from the random web activities of the public who are information users in the web world. It refers to the interconnect structure of web configuration nodes in the world.

The information retrieval apparatus 130 according to the embodiment of the present invention analyzes the intention and the context of the corresponding information thing and the information user search word through mutual definition of categories. For example, in a search term analysis, when an information user enters a search term corresponding to four categories in the search box, the search engine implements an algorithm in which an entity is defined from a directory and a record is defined from an intention, thereby resolving the intention and context of the search term. You can analyze it. For example, in the search box, if the substance is 'song of may', the directory 'during beauty', and the record is 'jangnara', you want to listen to 'song of may'. The directory 'Beauty Beauty' is regulated and the directory 'Beauty Beauty' is regulated in the record 'Jang Na'. The record 'jangnara' is defined by the intention 'listening'. Also, the term 'song of may' is to define 'song of may' in reverse order of intention 'listening', record 'jangnara' and directory 'during beauty' (or sequential). In this context, the regulation may mean that the scope is further defined (or specified) and interconnected.

Thus, in the present invention, each category is composed of the uppermost attribute of the dependent process tree system, the lowest attribute, that is, the depth. For example, in the directory (category), 'Beauty Beauty' can have a directory system order of [Culture 〈Entertainment 』Broadcast 〈Drama』 Korean Drama ― KBS2 』2011 Drama ～ Mon Tue Drama】. Therefore, in the search engine according to an embodiment of the present invention, an algorithm for analyzing eight categories for analyzing the intention and context of specific information data or search term of a specific search user is finally intended through analysis of sub-attributes of these eight categories. Iteratively goes through the filtering steps until we get the analytic value of the context. Iterative filtering of this algorithm is that one category further refines its value by comparing the values of the other category to itself. This is based on the mutual regulation described above.

In more detail, first, the information retrieval apparatus 130 first constructs a new search engine based on the theoretical principles of the object-oriented theory and the ontology theory in order to perform a search corresponding to the user's intention. Ontology is a technology that explicitly defines the types of concepts and constraints on use that are conceptual and computer understandable that subjects related to web information have reached consensus through discussions about the web world in web communication. Means. In other words, it represents a logical set of categories that are explicit and can be understood by computers, such as search engines, that can be applied to all subjects in the web activities of each subject of search activity. Accordingly, in the embodiment of the present invention, as shown in Figs. 2 and 3, as a web search subject, an information user, a search engine, an information producer, and an information communication result are set around the information objects such as web data, and the web search process is performed. Is agreed between these subjects and is understood by the computer. In addition, object-orientation derives the categories that define the web activities of the subjects of the web world based on the object-oriented theories, and uses the object-oriented theory to object the properties and behaviors and other category objects and interfaces. The sum of the variable values of these various categories is the variable value of the identifier of the web data, that is, the ID of the information data.

More specifically, object orientation sets categories as objects according to object orientation theory, as shown in FIGS. 2 to 5, and objects have attributes and messages, represent attributes as variables, derive variable values Process the data that is the object of the object. Each of these category objects refers to the values of other category objects to derive their variable values and to inherit those values from other objects. In addition, each object creates sub-objects with its own more detailed attributes, inheriting the top-level objects, and subdividing the categories to process detailed task data.

The search engine of the information retrieval apparatus 130 according to the embodiment of the present invention derives the variable values of the categories for the corresponding web page in the various stages of the search, and combines the variable values of these categories in the stage of the search. Extract identity. In this synthesis step, the overall judgment for the web page is that the values of a certain category's variables are cross-qualified by the category values of other variables in addition to those based on their attributes. For example, the search term category 'Park Joo-young' is defined by the soccer player in the directory category and excludes Park Joo-young except the soccer player. A comprehensive judgment is made by combining the variable values of the categories based on this category cross-regulation.

In summary, the information retrieval apparatus 130 according to the embodiment of the present invention sets up a technical task to construct a new type of search engine based on the ontology theory and the object-oriented theory. The technical problems and solutions are as follows. . The embodiment of the present invention provides a series of grammars for web communication on the basis of the fact that 1) web communication of web subjects in the search world has certain categories that define the web activity of each subject in common, For example, create rules that search engines can understand. 2) It establishes information web data, information producer, information user, search engine platform, information network, which is a random law of information communication, and other subjects as a web subject that maintains the web world communication process. 3) In the web world, the categories commonly applied to the web activities of the web communication subjects are derived and the category grammar is given to them. These categories include search keywords, information users, directory of information objects, information objects records, producers, relevant web sites, folksonomy for search popularity, information networks for information communication, and other categories. An embodiment of the present invention sets various categories for a search engine and sets these categories as objects, and gives each of these objects its own attributes and behaviors. In addition, these individual objects refer to the variable values of other objects and give a grammar to the category interdependencies and qualities that obtain their own variable values. Through this, we derive variable values for the self-identity of the web data. 4) The web crawling process, indexing process, information user's keyword keyword input process, and the search server process that searches the web information and provides the information user through the analysis of these keywords are based on the grammar of the above category. To derive. In this case, the search refers to the matching of the variable values of these self-identities of each web subject. That is, it can be a search algorithm technique that connects the context and intention of the information thing with the context and intention of the information user keyword.

As described above, the information retrieval apparatus 130 according to the embodiment of the present invention finds a certain grammaticality concentrated under the web activity of the web world subjects, creates a category grammar that can be shared by all web subjects, and provides information through these category functions. A search technique is performed that maps the context of the user's intentions and their keywords to information objects, that is, web pages and their producers. To this end, the information retrieval apparatus 130 assigns the same categories in the indexing operation of the information data and the search word grammar of the information user, and derives the category values. Mapping the variable values of the information data and the information search term, that is, identity, maps the same intention and context in identifying the intention of the information producer inherent in the information data and the intention and context inherent in the search word.

In this process, the information retrieval apparatus 130, for example, when there is a request from the user terminal apparatus 100, provides information about a search box or a search box divided into four fields, as shown in FIG. Receives the search terms corresponding to the four categories entered for each field of the window, searches the data constructed in the eight categories using the received four search terms, and provides the search results back to the user terminal device 100. .

As a result of the above configuration, an embodiment of the present invention can accurately grasp the intention of the information user, grasp the context of the keyword keyword entered by him, and provide accurate information corresponding to the intention and context of the information user. In addition, since only information data corresponding to the intention of the information user can be provided, data traffic in the communication network 110 or the like may be reduced.

FIG. 7 is a block diagram showing the structure of the information retrieval apparatus of FIG. 1, FIG. 8 is a diagram illustrating HTML source code of a main page, and FIG. 9 is a diagram for explaining index alignment of the index processing unit of FIG. .

Referring to FIG. 7 together with FIG. 1, the information retrieval apparatus 130 according to the embodiment of the present invention includes an interface unit 700, a controller 710, a storage unit 720, an index processing unit 730, and a search engine unit. It can include some or all of 740. The index processing unit 730 may be included in the search engine unit 740, and the control unit 710 may perform the role of the search engine unit 740. All descriptions are included for the purpose of understanding the description.

Here, the interface unit 700 may include, for example, a communication module. For example, the interface unit 700 may provide a search window including a search entity, a directory, a record, an intention field, or information thereof under the control of the controller 710 upon request of the user terminal device 100 through a communication module. In this process, the interface unit 700 may additionally perform a process such as information conversion.

The controller 710 is responsible for the overall control of the interface unit 700, the storage unit 720, the index processing unit 730, and the search engine unit 740 in the information retrieval apparatus 130. For example, the controller 710 controls to provide the information about the search window implemented by the search engine 740 through the interface 700, and temporarily stores the processed information in the storage 720. Alternatively, the construction related information processed by the index processing unit 730 may be stored in the DB 130a of FIG. 1.

The storage unit 720 may include the DB 130a of FIG. 1. However, it may also mean a memory such as RAM that temporarily stores information processed by the controller 710 separately from the DB 130a. For example, data processed through the index processing unit 730 may be constructed in the DB 130a of FIG. 1. In this case, the storage unit 720 as a memory may temporarily store information processed under the control of the controller 710. If the information retrieval apparatus 130 does not have a separate DB 130a, the storage unit 720 may serve as the DB 130a. The storage unit 720 also stores (or builds up) data that is classified into, for example, eight core categories in association with the index processing unit 730 based on mutual regulation in accordance with the DB 130a. )do. Here, the key categories include keywords (K), users (U), directories (D), records (R), producers (P), containers (C), foxsons (F), and networks (N). According to such a category, the data constructed in the storage unit 720 based on mutual regulation is provided as a result of being searched under the control of the controller 710 when the search engine unit 740 is operated.

The index processor 730 may perform a process of constructing data by forming an object-oriented grammar based on search categories according to an embodiment of the present invention, and a separate algorithm may be implemented for this purpose. For example, in order to construct data, the index processing unit 730 performs crawling, which is a process of collecting information data, and performs the indexing process to classify the collected data, and the classified information data is stored in the storage unit 720. Alternatively, the process may be performed in the DB 130a of FIG. 1.

In this case, crawling is a process of mechanically visiting a specific web site or information storage server and obtaining web information necessary for indexing of information data. According to an embodiment of the present invention, web page information collection is not limited to collecting specific web page information of a specific web site through a web crawler, but collecting web information that can extract various categories inherent in the information communication relations of web world subjects. It is. For example, the information retrieval apparatus 130 collects the information by the web crawler under the instruction system of the URL server. The URL server instructs the web crawler to collect the web information which can derive these categories, and instructs the web crawler to perform the respective task. Analysis of information based on categories from the collected information becomes a challenge in the indexing process. Information gathering in web crawling may include the web data body, the web data HTML source code, web data related information producer information, and main web site HTML source code information including web data.

Here, the source code of the web information is an HTML language that allows the web information to be displayed on a computer screen through the Internet, as shown in FIG. It has a function to contain structural information about the web page that contains the subject information producer, URL address, screen layout, and link information of the web information. The information producer information can be found in the aforementioned web source code, and the website information means "main website HTML source code of the website containing the web data".

The indexer also takes web data stored in a web data repository and indexes it by analyzing its data structure. Here, the data structure analysis refers to an HTML source information analysis operation and a tag extraction operation of the web text body text in order to extract a category inherent in the web data. The analysis operation includes information such as domain address, local region, language, URL, and web data amount in the HTML source code of the web data as shown in FIG. 8, so that the desired information is analyzed by analyzing the HTML source code collected by the web crawler. The process of getting it. Do this to record the value of the category variable. The information classification process for creating an index is as follows. a) Web data HTML source data, web data domain, web data local area (IP analysis), web data language, web data URL, web data volume, web data producers, tags (keywords), character encoding, links Analyze, append content, parent content, and append image / video information and enter it into the web data source information field of the indexer. b) Analyzing the web site to which the web data belongs, analyzing the web directory area (or directory category) and web page producer information of this web site or the web site to which this web data belongs, and the indexer's website and web data producer. Fill in the information fields. c) Analysis of web data body text begins with word analysis, which is similar in appearance to web data analysis of existing keyword search engines. However, in the exemplary embodiment of the present invention, the key keywords are extracted from the web data by analyzing the keyword of the title of the web data or the tag information of the HTML source code. The accuracy of this web data is analyzed by substituting the context of the substance and attribute of the web data into a language value understood by the machine (search engine indexer). In particular, core keywords of the web data are extracted based on the title and tag importance weight of the web data source data, and the interconnection of these keywords and analysis of the keywords and the corresponding web site are analyzed. This key keyword corresponds to a substance among search terms fields entered by a search user. d) Network information, such as the frequency of use and the linkage of the key keyword, with the frequency of use in the web world, is obtained for the corresponding core keywords extracted from the corresponding web data and having a language value that can be understood by the machine. Based on the folksonomial and network values of these key words, the folksonomial and network values of the web data are obtained. In this case, the folksonomy of the web data is performed by the web log, and the information network analysis can be performed by the information network analyzer. e) Create a field index based on the values obtained in steps a) through d).

Prior to describing field indexing, in relation to the analysis of the web area and producer information mentioned above, for example, go to www.nawoopat.co.kr and look at the about us menu and its HTML source code. Knowing that it is a patent office site, the information producer can also check the information. The analysis results show that the web site is a patent law firm, and according to the directory classification table, the directory of the web site is the "patent office" and the name of the information producer of specific web data in the site. For example, "Hong Gil Dong" or "Now Patent & Law Firm" is described. In addition, the core keyword extraction method may be the same as the existing Naver or Google. Based on word analysis in the body of the web page and word frequency or tag information in the HTML source code. Furthermore, a quick look at the language that the machine can understand, the directory consists of a directory classification list, which is composed of symbols, such as directory entries and their library billing numbers. For example, if the patent law firm indicates "000001", the lawyer office indicates "000002", and the accountant office indicates "000003", the machine recognizes that it is a patent law firm as the symbol "000001". In addition, the accuracy of the "patent law firm" when the analysis data comes out of the data machine itself does not determine what "patent law firm" means, so according to the directory classification list "000001" means "patent Law Firm. "

In addition, when looking at how to obtain folksonomial value and network information from key keywords, search service such as Naver or Google usually has information storage server that analyzes the data and divides it into detailed areas for more accurate information service. . For example, there is a dictionary server for word analysis of data of a search word or web information, and there is a popularity information server of search word popularity in relation to a specific search word. For example, if you incorrectly type 'Park Ju-young' in Naver and enter 'Park Ji-young', the Naver search engine asks the information user to modify, “Do you find Park Ju-young?”, Or enter 'Park Ju-young' in the search box and search immediately. Below the window, term terms related to Park Joo-young, such as Park Joo-young Arsenal and Park Joo-young Goal, are presented. Such service means that Naver search engine works in conjunction with 'dictionary server' and 'popularity server' in relation to the search term. The search engine according to an embodiment of the present invention may operate in conjunction with these various servers. The above-mentioned weblog is a server that analyzes the popularity of a specific search term, that is, a folksonomy, and can be understood as the same as Naver's popularity server. An information connection network is a server in which a specific web data processes a link relationship with other web data. In the exemplary embodiment of the present invention, when a key keyword is extracted, the web log server is contacted to analyze the popularity of the key keyword, and the web log server transmits the popularity information of the keyword to a search engine such as 'Park Joo Young' for the search term 'Park Joo Young Arsenal'. "," Park Joo-young Goal, "which delivers popular search term information, and whose key keywords include the number of search engines on which web pages are most searched by multiple searchers, or which search terms Park Joo-Young combined with other search terms. The information network server is inquired and the information network server provides information about it.

8, the categorical variable value generation process in the index generation process based on the index information classification is as follows. a) keywordID: Registers a plurality of key keywords of the web data. b) The unique web data unique value (docID): A unique code value of the web page that represents the web page URL. c) directoryID: A directory classification system similar to the decimal system, which is an offline library data classification method, classified into a service field, a protocol classification, a major classification, a subclass, and a subclass subfield by the ID of the web data. Enter the variable value of the directory entry of the web data in this detail field. d) recordID: The area corresponding to the automated library cataloging format (MARC) of the offline library, the ID containing the attribute information of the web data, the title of the web page, the producer of the information, the address of the website, the type of data, the date of data creation and Contains creation history, data size, and data type information. e) producerID: The information value for the information producer, which derives the producer through the HTML source information and textual analysis of the web page, and finds the rank value of the producer importance from the information producer server for this producer, and then the value of the variable for the information producer. Enter. f) containerID: The value of the information about the web site where the web data is located. The web site combines the importance of the web site's importance figures from the web site server and the evaluation of the association between the nature of the web data and the web site's personality. Enter the variable value for. g) folksonomyID: The folksonomyID is a variable value processing area for the popularity information of the web data and the tag, and the folksonomy variable value of the web data is entered through the web log and the public web log of the search engine of the present invention. h) networkID: ID of the information network data of the web data. Based on the key keyword of the web data, this web data contains the connection contents with other web data. The network server provides dependencies, derivations, associations and groupings with other linked information based on eight categories based on the web page.

In detail, the key keyword extraction process is extracted from the title of the web data or the "tag" information of the HTML source code of the web data in the indexer's information classification process. Eigenvalues in web data refer to symbolic processing that can be understood by machines in directory classification lists in directory analysis in web site analysis. The classification eigenvalue of the web data proceeds to this preference processing. In addition, the directory classification from the web page, as mentioned above, for example, when the patent law firm denotes "000001", etc., means that the machine mechanically recognizes that this is a "patent law firm" as the symbol "000001". Also, when it comes to producer information, the "original" in the view menu of the website menu is HTML source code, so the information producer includes the general rule of HTML, but in the case of Chosun Ilbo, for example, the reporter's name becomes the information producer. In some cases, the Chosun Ilbo itself may be labeled as an information producer. Further, the analysis of the text is to analyze the text content, and to analyze whether the "tag" as a keyword is included in the body, and whether there is a link part connecting other web data in the text body. In relation to evaluating the nature of web data and its relevance to a web site, the analysis of the web data is analyzed because the web data itself may not be accurate. For example, in analyzing specific web data in the website of Now Patent Law Firm, it analyzes about us of www.nawoopat.co.kr and analyzes that the site belongs to the directory of Patent Law Firm, and based on the directory of Patent Law Firm By analyzing the web data, the accuracy of the web data analysis degree is improved. Container values are also treated as numbers and symbols, as mentioned above. For example, a patent law firm designates a unique value of "000001". In relation to the search term, keyword, or web data, the search engine requests the web log server or the information network server for the popularity or information network information of the information, and these servers provide the information. Linking web data to other web data means a link. When a user clicks on a certain keyword in one web page, the link to the other web page is called an information network. The network information is an analysis machine that performs information connection, or link analysis, on the Internet. Netminer, which is used in Korea, can be used, and among the search engine companies, the company has its own information network server. Analyze information network information of keywords or web data. For example, www.seoul.com/new/football/asnal/jypa is a web data with the URL / 001 that links another article within the text of an online newspaper article "Park Joo-young's debut goal." The network address is the URL of one of "Premiere", "Arsene Wenger" and "Bolton".

After the information data is constructed as described above, the information search apparatus 130 drives the search engine unit 740 to use search terms provided through a search window having a plurality of fields displayed on the user terminal apparatus 100. Indexed in the DB (130a) to search for data matching the stored data and provide it to the user. In this case, the search engine unit 740 analyzes the intention and the context of the corresponding information thing and the information user search word through mutual categories of categories. For example, in a search word analysis, when an information user inputs a search word corresponding to four categories, that is, an entity (S), a directory (D), a record (R), and an intention (I), a search box may be used. According to the search engine, the entity is specified from the directory, the directory is specified from the record, and the record is implemented from the intention to analyze the intention and context of the search word.

The role of the search engine unit 740 is a communication part with the information user, that is, a part that searches for the information from the search term analysis part, the information searcher UOI analysis, the search term keyword analysis, and the indexer aside from the interface function of providing a search box. This information can be divided into parts that are edited and output to the information user. If the indexer is an information thing, that is, web data-centered, the search engine unit 740 may perform the task of matching the corresponding information by sorting the search word of the information user by indexer indexing.

Specifically, the user interface corresponds to the search box part. This search box has a number of separate fields. The field of the search box is the core keyword part that the information user wants. It consists of the substance field, the directory of information the user wants, and the record field of the information, and finally the message field containing the intention of the user. These field combinations may therefore vary. The information user enters his search word keyword in four fields. The search terms of these individual fields have cross-qualifications that define the search term nature of the other fields.

For example, as described above, when an information user wants to listen to Jang Na's 'snowman of May' in the beauty during the drama, it may be entered as shown in <Table 1>.

Table 1

Field separator	substance	Directory	record	Intentions
example	Song of may	Beauty	Chang Na	Listening

The search term 'Song of May' is regulated by the drama 'Wild Beauty', is regulated by the singer 'Jang Nara', and captures the user's intention to listen to the song. On the contrary, it is regulated by the drama 'Beautiful Beauty' called 'Jang Na-ra' and by the drama 'Song of May'. Through this, the researcher understands the intention and context of the information user and creates a search sentence that the machine can understand.

Search term analysis (kID) analyzes the keywords entered by the information user in the search box to perform synonym adjustments and spelling corrections, and checks each keyword in the search box for a field, and calculates these keywords by calculating the mutual regulatory operator of these keywords. This is a process of analyzing the user's intention for information.

In relation to user ID analysis (uID), the search engine of the search engine unit 740 according to an embodiment of the present invention is a personalized search engine of an information user based on an information user ID and a password. In this step, the search engine analyzes the web activity attributes and the history of the information user based on the input keyword of the information user, finds the intention context of the information user for the keyword, and assigns the user intention value to the keyword.

Directory analysis (dID) is a process of finding a directory value of a web data intended by an information user based on a search word analysis result and an information user ID analysis.

Record analysis (rId) is a process of analyzing a record of web data intended by an information user based on a search word analysis result, an information user ID analysis, and a directory analysis value. Service type, title, data type, size, etc. are analyzed.

Producer analysis (pID) inputs the producer information value from the information producer server if the information consumer specifies the producer, and if the information consumer does not specify the producer, the information producer attribute is added to the information producer server. This is the process of adding specific weights by filtering.

Container Analysis (cID) analyzes container information immediately when an information consumer designates an information web site, and compares the web data with its web site if the user does not designate a web site. The giving process is.

Foxonomy analysis (fID) is the process of finding web data that matches the intention of the information user and calculating popularity scores in the web-using masses of these web data. The web log will be used to select the oxonomy value.

Network analysis (nID) is a process of analyzing the information network of these web data and tag keywords with the web data finally selected through Foxsonomi.

Editing and printing is the process of finally mapping the data stored in the indexer and outputting it to the information user.

As such, the search engine unit 740 according to an embodiment of the present invention implements a search algorithm, for example, provides a search box to the information user through the interface unit 700 and uses various search terms received through the search box. The analysis is performed, and the analysis result is mapped to data constructed in the DB 130a or the storage unit 720 of FIG. 1, and the mapping result is mapped to the information user through the interface unit 700 under the control of the control unit 710. Provide.

For convenience of description, referring to FIG. 10 together with FIGS. 1 and 7, the information retrieval apparatus 130 receives search box information including a search entity, a directory, a record, and an intention field inputted by the information user into the search box ( S1010). In order to receive such search box information, the information retrieval apparatus 130 may obtain necessary information by providing information about the search box to the information user by implementing an algorithm by driving a search engine.

Next, the information retrieval apparatus 130 determines the level to which the search term in the directory and record fields belong among the information in the search box as the highest level, and includes the content including the search term input in the search entity field based on the directory and record level determined as the highest level. The search is made (S1020, S1030). Here, the search may mean a search of the DB 130a of FIG. 1 or the storage unit 720 of FIG. 7.

The information retrieval apparatus 130 limits the directory and record levels to lower levels, and re-searches the contents within the range of the retrieved contents (S1040). Through this process, the information retrieval apparatus 130 may search for content that is further reduced in the range of the first searched content.

Such steps S1030 and S1040 may be regarded as a process of searching for information data constructed by dividing four categories into eight categories from the search box. In this process, the information retrieval apparatus 130 may perform a search based on mutual regulation. Details related to this have been described above sufficiently, so further description thereof will be omitted.

As a result, the information retrieval apparatus 130 can search for information accurately and quickly based on the mutual definition of web categories that include intention and context in web communication.

On the other hand, even if all the components constituting the embodiment of the present invention is described as being combined or operated in combination, the present invention is not necessarily limited to these embodiments. In other words, within the scope of the present invention, all of the components may be selectively operated in combination with one or more. In addition, although all of the components may be implemented in one independent hardware, each or some of the components of the program modules are selectively combined to perform some or all of the functions combined in one or a plurality of hardware It may be implemented as a computer program having a. Codes and code segments constituting the computer program may be easily inferred by those skilled in the art. Such a computer program may be stored in a computer readable media and read and executed by a computer, thereby implementing embodiments of the present invention. The storage medium of the computer program may include a magnetic recording medium, an optical recording medium, a carrier wave medium, and the like.

Although the preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and the present invention is not limited to the specific embodiments of the present invention without departing from the spirit of the present invention as claimed in the claims. Various modifications can be made by those skilled in the art, and these modifications should not be individually understood from the technical spirit or the prospect of the present invention.

-

Claims

An interface unit providing information of a search window including a search entity, a directory, a record, and an intention field to a user terminal device;

A storage unit including a database in which content information indexed and classified into a plurality of search categories is recorded;

A control unit for sequentially searching for the plurality of search categories by using the search word input for each field when the search word is input for each field in the search window;

Information retrieval apparatus comprising a.
The method of claim 1,

The control unit,

Determine the directory level to which the directory search term entered in the directory field belongs and the record level to which the record search term entered in the record field belongs, respectively, and search within the plurality of search categories based on the determined directory level and record level. Search for content that includes the search term entered in the Entity field.

And limiting the directory level and the record level to lower levels by using the searched content, and re-searching the content using the limited directory level and the record level within the range of the searched content.
The method of claim 1,

The controller determines the intention of the user by analyzing the searched content, and limits the directory level and the record level to lower levels by using the user's intention, and sets the limited directory level and the record level. And the searched content is filtered for the plurality of search categories.
The method of claim 3,

The control unit,

And defining the level of each field by using each search word input in the search entity, directory, record, and intention field, and determining the intention of the user according to the prescribed level.
The method of claim 1,

The control unit,

If the searched content includes content that meets the user's intention, the information retrieval device is stopped and the interface unit is controlled to provide a search result screen including the searched content to the user terminal device. .
The method of claim 1,

The control unit,

And re-search the content if the searched content is within a preset number, and control the interface unit to provide a search result screen including the searched content to the user terminal device.
The method according to any one of claims 1 to 6,

Wherein the plurality of search categories includes a keyword, a user, a directory, a record, a producer, a container, a folkson, and a network.
Providing information in a search window comprising a search entity, a directory, a record, and an intention field;

When a search word is input for each field in the search window, determining a directory level to which the directory search word input in the directory field belongs and a record level to which the record search word input in the record field belongs;

Sequentially searching for contents including a search word input in the search entity field within a plurality of preset search categories based on the determined directory level and record level; And

Limiting the directory level and the record level to lower levels by using the searched content, and re-searching the content using the limited directory level and the record level within the range of the searched content;

Information retrieval method comprising the.
The method of claim 8,

Re-searching the content,

Analyzing the retrieved content to determine an intention of the user;

Limiting the directory level and the record level to lower levels respectively using the intention of the user; And

Filtering the searched content for the plurality of search categories using the limited directory level and the record level;

Information retrieval method comprising the.
The method of claim 9,

Determining the intention of the user,

And defining the level of each field by using each search word input in the search entity, directory, record, and intention field, and determining the intention of the user according to the prescribed level.
The method of claim 9,

If the searched content includes content corresponding to the intention of the user, stopping the re-search of the content and providing a search result screen including the searched content.
The method of claim 8,

Stopping re-searching the content if the searched content is within a preset number, and providing a search result screen including the searched content.
The method according to any one of claims 8 to 12,

Wherein the plurality of search categories includes a keyword, a user, a directory, a record, a producer, a container, a folkson, and a network.
A computer readable recording medium having stored thereon a program for executing an information retrieval method,

The information retrieval method,

Providing information in a search window comprising a search entity, a directory, a record, and an intention field;

When a search word is input for each field in the search window, determining a directory level to which the directory search word input in the directory field belongs and a record level to which the record search word input in the record field belongs;

Searching for content including a search word input in the search entity field within a plurality of search categories based on the determined directory level and record level;

Limiting the directory level and the record level to lower levels by using the searched content, and re-searching the content using the limited directory level and the record level within the range of the searched content;

And a computer readable recording medium.
The method of claim 14,

Wherein the plurality of search categories includes keywords, users, directories, records, producers, containers, folksonics, networks.