KR20020071418A

KR20020071418A - Information searching system and method thereof

Info

Publication number: KR20020071418A
Application number: KR1020010011565A
Authority: KR
Inventors: 김시환
Original assignee: 김시환
Priority date: 2001-03-06
Filing date: 2001-03-06
Publication date: 2002-09-12
Also published as: KR100421530B1

Abstract

PURPOSE: An information retrieval system and method is provided to give a word code to a search word according to a rule, to search for the word code in a database, to separate retrieved information into information identical to a word code of a search word, information having a main component word code identical to a main component word code of a search word, and information having a component word code identical to a component word code of a search word, to weight the retrieved information according to a matching value, and to display the retrieved information according to a weighted order. CONSTITUTION: The method comprises steps of determining if an input search word is a compound noun consisting of more than two words(S8100-S8110), in a case that the input search word is a single word, converting the search word into a corresponding word code, searching for information in a database based on the word code, and retrieving information identical to the search word code or information including word codes most similar to the search word code(S8130-S8140), giving a serial number to the retrieved information(S8150), in a case that the input search word is a compound noun, determining if a subject word is separable from the search word(S8120), in a case that the subject word is separable from the search word, converting the search word into a word code, searching for information in a database based on the word code, and retrieving information identical to the search word code or information including word codes most similar to the search word code(S8160), giving a serial number to the retrieved information(S8170), giving a matching value to the retrieved information according as the retrieved information has a word code identical to the search word code, has a main component word code identical to a main component word code of the search word or a component word code identical to a component word code of the search word, determining a weighting order of the retrieved information based on the matching value, in a case that the search word has more than two words and a subject word is separable from the search word, separating the subject word from general words, giving different matching values according as the retrieved information has the subject word or the general word, and determining a weighting order of the retrieved information based on the matching value.

Description

Information retrieval system and its method {INFORMATION SEARCHING SYSTEM AND METHOD THEREOF}

본 발명은 정보 검색 방법에 관한 것으로 더욱 상세하게 말하자면, 단어 코드를 이용하여 정보를 검색하는 검색방법에서, 검색어와 검색된 정보 사이에 일치값을 계산하여, 검색된 정보의 순위를 정하는 방법에 관한 것이다.The present invention relates to an information retrieval method, and more particularly, to a method of retrieving information by using a word code to calculate a match value between a search word and the retrieved information and to rank the retrieved information.

최근에는 인터넷을 통한 정보 교환이 급격히 증가되고 있으며, 이에 따라 인터넷 상에서 원하는 정보를 신속하고 정확하게 찾아낼 수 있는 다양한 검색 엔진 들이 개발되고 있다.In recent years, information exchange through the Internet has been rapidly increasing, and accordingly, various search engines have been developed to quickly and accurately find desired information on the Internet.

그러나 현재의 검색 엔진들은 입력되는 단어와 일치되는 정보만을 검색하기 때문에, 인터넷 이용자들이 찾고자 하는 정보와 일치되는 단어를 모르는 경우에는 원하는 정보를 용이하게 찾을 수 없는 경우가 발생하며, 이에 따라 원하는 정보를신속하고 정확하게 찾아낼 수 없게 된다.However, current search engines search only the information that matches the input word, so if the Internet users do not know the word that matches the information they are looking for, they may not be able to easily find the desired information. You will not be able to find it quickly and accurately.

특히 검색된 정보의 수가 많을 경우에는 어느 정보가 검색어와 가장 많이 일치하는 가를 판단하기는 더욱 어렵게 된다.In particular, when the number of searched information is large, it is more difficult to determine which information most matches the search word.

그러므로, 본 발명의 목적은 신속하고 정확하게 찾아낸 원하는 정보 중에서, 검색어와 가장 많이 일치하는 정보를 용이하게 찾을 수 있는 정보 검색 시스템과 방법을 제시하는 것이다.Therefore, it is an object of the present invention to provide an information retrieval system and method that can easily find information that most closely matches a search word among desired information found quickly and accurately.

또한, 본 발명의 다른 목적은 검색어와 일치하는 정도에 따라 검색된 정보의 순위를 정하여, 검색된 정보를 효율적으로 제공하고자 하는데 있다.In addition, another object of the present invention is to provide a searched information efficiently by ranking the searched information according to the degree matching the search word.

도 1은 본 발명의 실시예에 따른 정보 검색 시스템의 블록도이다.1 is a block diagram of an information retrieval system according to an embodiment of the present invention.

도 2 내지 도 6은 본 발명의 실시예에 따른 검색된 정보에 대하여 순위를 정하는 방법을 나타내는 순서도이다.2 to 6 are flowcharts illustrating a method of ranking the retrieved information according to an embodiment of the present invention.

이러한 기술적 과제를 달성하기 위한, 본 발명의 특징에 따른 정보 검색방법은, 정보를 나타내는 모든 단어들이 기본 단어와 복합 단어로 분류되는 방법으로, 검색어를 설정된 규칙에 따라 코드화하고, 상기 코드화된 검색어를 토대로 상기 데이터 베이스를 검색하여 상기 단어 코드와 동일하거나 가장 많이 일치하는 정보를 검색하고, 상기 검색된 정보의 순위를 정하는 방법에 있어서, 검색어의 단어 코드와 동일한 정보, 검색어의 주구성 단어 코드를 주구성 단어 코드로 가진 정보 및, 검색어의 구성 단어 코드를 구성 단어 코드로 코드로 가진 정보를 구별하여 각각에 서로 다른 일치값을 부여하여, 상기 검색된 정보 각각에 순위를 정한다.In order to achieve the above technical problem, the information retrieval method according to the characteristics of the present invention is a method in which all words representing information are classified into a basic word and a compound word, the search term is encoded according to a set rule, and the coded search term is In the method of searching the database based on the same or most matching information to the word code, and ranking the searched information, the information comprising the same information as the word code of the search word, the main constituent word code of the search word The information which has a word code and the information which has a constituent word code of a search word as a constituent word code is distinguished, and each gives a different matching value, and ranks each said retrieved information.

상기 처리 장치에서, 검색어를 입력할 경우 검색어의 단어가 2 개 이상으로 이루워졌고, 검색어에서 주제어의 구별이 가능할 경우, 검색어의 주제어와 일반어를 각각 구별하여, 주에어와 일반어에 서로 다른 일치값을 부여하여, 상기 검색된 정보 각각에 순위를 정한다.In the processing apparatus, when a search word is input, the search word is composed of two or more words, and when the main word can be distinguished from the search word, the main word and the general word of the search word are distinguished from each other so that the main air and the general word are different from each other. By assigning a matching value, each of the retrieved information is ranked.

그러므로, 검색어의 주제어의 단어 코드와 동일하거나 가장 많이 일치하는 단어 코드를 주제어로 가진 정보, 검색어의 일반어의 구성 요소 단어 코드와 일치하는 구성 요소 단어 코드를 가진 정보를 구별하여 각각에 서로 다른 일치값을 부여하게 된다.Therefore, it is possible to distinguish the information having the same or most matched word code as the word code of the main word of the search word and the information having the component word code that matches the component word code of the general word of the search word. To give a value.

한편, 검색어의 주구성 단어 코드와 동일한 주구성 단어 코드를 가졌지만 검색어의 구성 단어 코드를 하나도 가지고 있지 않은 정보와, 검색어의 주구성 단어 코드는 가지고 있지 않지만 검색어의 구성 단어 코드를 구성 단어 코드로 모두 가지고 있는 정보는 우선 순위가 서로 같게 된다.On the other hand, information that has the same main word code as the main word code of the search word but does not have any of the main word codes of the search word, and that the main word code of the search word but does not have the main word code of the query, All information has the same priority.

또한, 검색어와 동일한 정보, 검색어를 포함하고 있는 정보, 검색어의 주제어와 동일하고 검색어의 일반어의 단어 코드를 일부 가지는 정보의 순으로 우선 순위를 가지고, 그다음으로 검색어의 주제어만 동일하거나 검색어의 일반어만 모두 동일한 정보는 서로 우선 순위가 같고, 다음 우선 순위로는 검색어의 주제어와 일부 동일하거나 검색어의 일반어와 일부 동일한 정보의 순으로 우선 순위를 가지게 된다.In addition, it has priority in order of the same information as the search word, information containing the search word, information that is the same as the main word of the search word, and has some word codes of the general word of the search word. However, all of the same information has the same priority, and the next priority has the same order as the main word of the search word or the same information as the general word of the search word.

따라서, 상기 일치값의 크기의 순서에 따라 상기 검색된 정보의 순위를 나열하게 된다.Therefore, the ranking of the retrieved information is listed in the order of the magnitude of the matching value.

이하, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 본 발명을용이하게 실시할 수 있는 가장 바람직한 실시예를 상세히 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to those skilled in the art.

본 발명은 단어의 의미를 이용한 개념 검색에 있어서 검색된 결과의 순위를 제공하는 것이다.The present invention is to provide a ranking of the searched results in the concept search using the meaning of the word.

일반적으로 단어를 설명한다는 것은 단어의 의미를 상세히 기술하는 것으로써, 이 때 설명되는 단어의 의미를 통한 통일된 규칙에 의하여 코드화 할 수 있다. 이 때, 대부분의 단어들은 일정한 수로 정해진 기본 단어로 설명할 수 있으며, 이러한 기본 단어를 일정한 자리수를 가지는 코드로 코드화하여 해당 단어의 단어 코드를 생성할 수 있다. 그러므로 단어 코드란 단어의 의미를 기본 단어 코드로 나열한 것이 된다.In general, describing a word describes the meaning of the word in detail, and can be encoded according to a unified rule through the meaning of the word. At this time, most of the words can be described as a basic word determined by a certain number, it is possible to generate a word code of the word by encoding the basic word with a code having a certain number of digits. Therefore, a word code is a list of meanings of words as basic word codes.

상기의 단어 코드를 이용하여 정보를 검색하는 방법에 관한 특허는 본원 발명자의 선출원된 특허 제 2000-69722 호 등을 참조할 수 있다.For patents on methods of retrieving information using the above word codes, reference may be made to the applicant's patent application No. 2000-69722 or the like.

"세상의 단어"들을 설명할 수 있는 기본 개념을 설정하고 이들 기본 개념의 조합으로 "세상의 단어"들을 설명한다고 하였을 때, 여기서의 기본 개념이 본 발명의 기본 단어가 된다. 따라서 기본 단어의 코드의 조합으로 단어를 나타낸 것이 단어코드가 되며, 각각의 기본 단어 코드는 하나의 의미에 대응된다.When a basic concept that can describe "world words" is set and a combination of these basic concepts is used to describe "world words", the basic concept here becomes the basic word of the present invention. Therefore, a word code is represented by a combination of codes of basic words, and each basic word code corresponds to one meaning.

본 발명의 실시예에서는 기본적으로 정보를 나타내는 모든 단어들을 골격이 되는 단어 즉, 기본 단어와 복합 단어로 나누어서 구분하고, 각 단어들을 기본 단어로 코드화하여 해당하는 단어 코드를 생성한다.In an embodiment of the present invention, basically, all words representing information are divided into a skeleton word, that is, a basic word and a compound word, and each word is encoded into a basic word to generate a corresponding word code.

이러한 코드화 규칙에 따라 모든 정보들을 코드화하여 저장한 다음에, 단어 코드를 가지고 정보를 검색하면 의미를 가지고 정보를 검색하는 결과를 얻을 수 있는 것이며, 의미를 가지고 정보를 검색하므로 개념 검색이라고 할 수 있다.After all the information is coded and stored according to these coding rules, if you search the information with the word code, you can get the result of searching the information with meaning. .

하지만 자연어 검색을 지원하기 위해서는 문장에 적용 할 수 있어야 하며, 문장 검색에 활용되기 위해서는 각 문장 내에서의 각 단어의 역할을 고려하여 정보를 검색하여야 한다.However, in order to support natural language search, it must be applicable to sentences, and to be used for sentence search, information should be searched considering the role of each word in each sentence.

본 발명의 단어 코드는 문장이나 단어를 개념적으로 나타내는 것이므로, 문장에 있어서 가장 중요한 것은 주제어이다. 따라서 본 발명에서는 주제어에 대하여만 별도의 역할 코드를 부여하도록 한다.Since the word code of the present invention conceptually represents a sentence or a word, the most important thing in the sentence is the main word. Therefore, in the present invention, a separate role code is assigned only to the main word.

문장 내에서 주제어의 분석은 조사 분석, 단어의 형태소 분석 및 각 단어간의 연결에 의한 의미 분석, 혹은 단어의 위치 분석 등을 통해서 알 수 있으며, 이러한 방법은 언어학에서 만들어진 통상의 이론에 의한다.The analysis of the main words in a sentence can be known through survey analysis, morphological analysis of words and meaning analysis by linking each word, or analysis of the position of words. This method is based on common theory made in linguistics.

주제어란, 문장내에 주어의 역할을 하는 단어이다. 물론 검색어가 문장을 이루지 않고, 몇개의 단어로 이루어지는 구인 경우에는 최종적으로 수식을 받는 단어가 주제어가 된다.A subject word is a word that acts as a subject in a sentence. Of course, if the search word does not form a sentence, but is a phrase composed of several words, the word that is finally modified is the main word.

또한 주제어를 선정하는 프로그램에 의한 자동적인 분석은, 이 이론에 의해 만들어진 종래의 각종 단어 처리기 프로그램을 활용할 수가 있다. 실제로 문장 내의 각 단어의 역할 분석에 관한 이론은 번역 프로그램 등에 이용되고 있다.In addition, the automatic analysis by the program which selects a main word can utilize the conventional various word processor programs produced by this theory. In fact, the theory of the role analysis of each word in a sentence is used in a translation program.

또한 문장 검색에 있어서 문장에 있는 모든 단어를 전부 단어 코드로 바꾸어 줄 필요는 없다. 즉 문장 내에 명사, 형용사, 동사 등 일부 중요한 단어만을 단어코드로 바꾸어 주면 된다. 왜냐하면, 정보를 검색할 때는 개념적으로 검색하는 것이 훨씬 효율적이고, 문장 내에서 중요 단어만을 기본으로 검색해도, 개념 검색에는 충분히 부응할 수 있기 때문이다.Also, in sentence search, it is not necessary to replace every word in a sentence with a word code. In other words, only some important words such as nouns, adjectives, and verbs in a sentence need to be replaced with word codes. This is because it is much more efficient to search conceptually when searching for information, and even if only essential words are searched in a sentence, the concept search can be sufficiently satisfied.

본 발명에서는 검색어에서 주제어를 구별하여 검색을 실시하는 것이므로, 주제어에 대한 구별을 할 필요가 있게 된다. 즉 문장내에 주제어가 있을 경우 주제어의 단어 코드에 주제어를 나타내는 역할 코드 "S(주제어)"를 부여하도록 한다.In the present invention, since the subject is searched by distinguishing the subject from the search word, it is necessary to distinguish the subject. That is, when there is a main word in a sentence, a role code "S (main control)" representing the main word is assigned to the word code of the main word.

한편 단어 코드를 만드는 예를 들어 보면 다음과 같다. 단어 코드를 이루는 기본 단어 코드의 자리수를 일정하게 하여, 프로그램 상에서의 검색 비교가 용이하게 할 수 있다. 예를 들어 단어 코드가 "nmamkpo-fstelolor" 라고 할 때, 명사라는 품사를 나타내는 최초 자리에 있는 "n" 코드를 제외하면, 나머지 기본 단어 코드는 "ma, mk, po, -f, st, el, ol, or" 와 같이 모두 2자리로 구성하여, 단어 코드 사이의 상호 비교가 용이하도록 한다.An example of creating a word code is: The number of digits of the basic word code constituting the word code is made constant, so that the search comparison on the program can be facilitated. For example, if the word code is "nmamkpo-fstelolor", except for the "n" code in the first place of the noun part of speech, the remaining basic word codes are "ma, mk, po, -f, st, el." , ol, or "are composed of all two digits to facilitate mutual comparison between word codes.

또한 단어 코드내에 각 구성 단어 코드의 위치를 정해서 가장 많이 일치하는 정보를 좀 더 용이하게 찾을 수 있도록 한다. 즉, 수식어적 기능을 하는 수식어의 기본 단어 코드는 수식을 받는 주제어의 주 구성 단어 코드 바로 뒤에 위치하고, 부사적 기능을 하는 기본 단어 코드는 "-"의 뒤에 위치하도록 하였다.It also locates each component word code within the word code, making it easier to find the most matching information. That is, the basic word codes of the modifiers that function as modifiers are located immediately after the main constituent word codes of the subject words receiving the modifiers, and the basic word codes that are adverb functions are located after "-".

예를 들어, “판막(valve)”이라는 단어에 대하여 "심장(ha, heart)에서(-i, in) 피(bl, blood)의 흐름(fl, flow)을 조절(co, control)하는 기관(or, organ)”이라는 의미를 부여하여 “menor=coblfl-ha"라는 단어 코드를 생성할 수 있으며, 이 단어 코드에서처럼 동사나 서술어적 기능을 하는 단어 코드 앞에는 "="의 코드를 부여하여, 서술어적 기능을 하는 단어 코드를 구별 할 수 있도록 한다. 이에 따라 단어코드 비교시에 두자리씩 단순히 비교하게 되어 서술어적 기능을 하는 단어 코드의 구별이 어렵게 되는 점을 방지하였다.For example, the term "valve" refers to the organ "co," which controls the flow of blood (bl, blood) in the "ha, heart" (-i, in). (or, organ) ”to create the word code“ menor = coblfl-ha ”. As in this word code, the word code that acts as a verb or descriptive function is given a code of“ = ", Identify word codes that act as descriptive functions. Accordingly, when comparing word codes, two digits are simply compared to prevent the distinction between word codes having a descriptive function.

도 1에 본 발명의 실시예에 따른 정보 검색 시스템의 구조가 도시되어 있다.1 shows a structure of an information retrieval system according to an embodiment of the present invention.

도 1에 도시되어 있듯이 본 발명의 실시예에 따른 정보 검색 시스템(10)(이하, 설명의 편의를 위하여 정보 검색 서버로 명명한다)은 찾고자 하는 정보에 해당하는 단어나 문장들을 입력하는 입력부(11), 입력부(11)를 통하여 입력되는 단어나 문장들(이하 검색어라고 명명함)을 기본 단어로 세분화한 다음에 코드화 하고, 코드화된 검색어를 토대로 해당하는 정보를 검색하는 중앙 처리 장치(22), 기본 단어로 세분화되어 코드화되어 있는 다수의 정보들이 저장되어 있는 데이터 베이스(23), 입력부(11)를 통하여 입력되는 검색어를 표시하고, 중앙 처리 장치(22)에 의하여 검색된 결과를 표시하는 디스플레이부(14)를 포함한다.As illustrated in FIG. 1, the information retrieval system 10 (hereinafter, referred to as an information retrieval server for convenience of description) according to an embodiment of the present invention may include an input unit 11 for inputting words or sentences corresponding to information to be searched for. A central processing unit 22 for subdividing words or sentences (hereinafter referred to as search terms) input through the input unit 11 into basic words and then encoding them, and searching for corresponding information based on the encoded search terms; A display unit for displaying a search word input through the database 23 and the input unit 11 in which a plurality of pieces of information subdivided into basic words and coded are stored, and a result searched by the central processing unit 22 ( 14).

정보 검색 서버(10)는 도 1에서와 같이, 네트워크(유무선 네트워크, 퓨쳐 네트워크 등) 예를 들어 인터넷(20)과 연결될 수 있으며, 인터넷(20)을 통하여 정보 입력 장치(30)와 연결된다. 이에 따라 중앙 처리 장치(22)의 제어에 따라 정보 입력 장치(30)와 데이터를 송수신하는 인터페이스부(15)를 더 포함한다.As shown in FIG. 1, the information retrieval server 10 may be connected to a network (wired / wireless network, future network, etc.), for example, the Internet 20, and may be connected to the information input device 30 through the Internet 20. Accordingly, the apparatus further includes an interface unit 15 for transmitting and receiving data to and from the information input device 30 under the control of the central processing unit 22.

정보 검색 서버(10)는 다수의 정보를 설정된 규칙에 따라 세분화 및 코드화 하여 데이터 베이스(23)를 구축하며, 데이터 베이스(23)를 토대로 하여 인터페이스부(15)를 통하여 정보 입력 장치(30)로부터 입력되는 검색어 또는 입력부(11)를 통하여 입력되는 검색어에 해당하는 정보를 검색하여 그 결과를 이용자의 정보 입력 장치(30)로 제공하거나 디스플레이부(14)에 표시한다.The information retrieval server 10 constructs a database 23 by subdividing and coding a plurality of pieces of information according to a set rule, and from the information input device 30 through the interface unit 15 based on the database 23. Information corresponding to the search word input through the input search word or the input unit 11 is searched and the result is provided to the user's information input device 30 or displayed on the display unit 14.

이에 따라 정보 검색 서버(10)의 데이터 베이스(23)는 인터넷 싸이트 운영및 시스템 운영에 필요한 데이터를 저장하는 운영 데이터 베이스(131)와, 다수의 정보가 기본 단어로 세분화 및 코드화 되어 저장되어 있는 단어 데이터 베이스(132)로 이루어진다.Accordingly, the database 23 of the information retrieval server 10 includes an operation database 131 for storing data necessary for Internet site operation and system operation, and a word in which a plurality of pieces of information are subdivided and encoded into basic words. Database 132.

그리고, 중앙 처리 장치(22)는 운영 데이타 베이스(131)에 저장된 데이터를 토대로 하여 싸이트 및 시스템을 운영하는 싸이트 운영부(121)와, 입력부(11)를 통하여 입력되는 다수의 정보를 기본 단어로 세분화하고 코드화 하여 단어 데이터 베이스(132)에 저장하고, 입력부(11) 또는 인터페이스부(15)를 통하여 입력되는 데이터 즉, 검색어를 기본 단어로 세분화 및 코드화 하는 데이터 처리부(122), 데이터 처리부(122)에서 처리된 검색어를 토대로 단어 데이터 베이스(132)를 검색하여 검색 데이터에 해당하는 정보를 찾는 데이터 검색부(123)를 포함한다.In addition, the central processing unit 22 subdivides the site operator 121 operating the site and the system and the plurality of pieces of information input through the input unit 11 based on the data stored in the operation database 131 into basic words. A data processor 122 and a data processor 122 for segmenting and encoding the data input through the input unit 11 or the interface unit 15, that is, a search word into a basic word. The data search unit 123 searches the word database 132 based on the search word processed by the to find information corresponding to the search data.

본 발명의 실시예에 따른 정보 검색 서버(10)에 접속할 수 있는 정보 입력 장치(30)로는 컴퓨터가 이용되며, 이외에도 인터넷(20)과 연결될 수 있는 다른 통신 장치가 이용될 수도 있다.A computer is used as the information input device 30 that can be connected to the information retrieval server 10 according to an embodiment of the present invention, and other communication devices that may be connected to the Internet 20 may be used.

먼저, 이러한 구조로 이루어지는 정보 검색 시스템에서 정보를 구성하는 단어나 문장들이 코드화되는 것에 대하여 설명한다.First, the words or sentences constituting the information in the information retrieval system having such a structure will be described.

본 발명에서는 검색어를 기본 단어로 세분화하고 각 기본 단어를 알파벳이나 아라비아 숫자 등으로 이루어지는 코드로 코드화한다. 여기서 검색어는 하나의 단어일 수도 있으며 2개 이상의 단어로 이루어지는 절이나 구 또는 문장일 수 있다. 코드화된 검색어를 검색어의 단어 코드라고 하며, 단어 코드를 구성하는 2자리의 코드를 구성 요소 단어 코드라고 한다.In the present invention, the search word is subdivided into basic words, and each basic word is encoded into a code consisting of alphabets or Arabic numerals. Here, the search word may be a single word or may be a phrase, phrase, or sentence consisting of two or more words. The coded search word is called a word code of the search word, and the two-digit code constituting the word code is called a component word code.

그리고 구성 요소 단어 코드는 다시 주 구성 단어 코드와 구성 단어 코드로 구분하는 데, 단어 코드내에서 주제어적 역할을 하는 구성 요소 단어 코드가 주 구성 단어 코드이고, 그외의 구성 요소 단어 코드가 구성 단어 코드가 된다.The component word codes are further divided into main component word codes and component word codes. The component word codes that play a main role in the word codes are the main component word codes, and the other component word codes are the component word codes. Becomes

예를 들어 “2000년대에는 엔진 기술이 나날이 전자화 되고 있다.”라는 문장에서, 코드화하면, “2000 년대에는 엔진(nmamkpo-fstelolor) 기술(nkn-iscinan)이 나날이 전자(nel)화 되고(vbc) 있다.”로 코드화시킬 수 있다. 이 문장의 주어는 "기술”이므로, 기술의 단어 코드에 주제어의 역할 코드를 "S", 를 부여하게 된다.For example, in the sentence "Engine technology is becoming more electronic day by day in the 2000s", when coded, "Nmamkpo-fstelolor technology (nkn-iscinan) in the 2000s is becoming more (nel) by day (vbc)" Yes, you can. " Since the subject of this sentence is "description", the role code of the main word "S", is given to the descriptive word code.

여기서, “엔진”은 “ 스팀(st, steam), 전기(el, electric) 혹은(or) 기름(ol, oil) 등으로부터(-f, from) 동력(po, power)을 만드는(mk, make) 기계(ma, machine)”라는 기본 단어로 설명할 수 있으며, 그리고 중요 단어만을 선택하여 코드화하면 엔진의 단어 코드는 "nmamkpo-fstelolor"로 나타낼 수 있다. 여기서 맨 앞의 코드 "n"은 "엔진"이라는 단어가 명사임을 나타내며, 품사를 나타내는 코드 뒤에 수식을 받은 주 구성 단어인 "기계"의 코드 "ma"가 위치되고, 이어서 수식하는 구성 단어인 "만드는"의 코드인 "mk"가 위치되고 다음에 목적어인 "동력"의 코드인 "po"가 위치된다. 그리고 부사적 기능을 하는 기본 단어 코드 "fstelolor"이 "-"에 이어서 뒤에 위치된다. 각 단어들은 2자리로 코드로서 표시되며, 이 단어 코드에서 맨 뒤에 위치된 "or"은 "stelol"들이 서로 논리합 연결 관계를 가지는 것을 나타낸다.Here, “engine” means “mk, make” (-f, from) steam (el, electric) or (or) oil (ol). ), The basic word "ma, machine", and if you select and code only important words, the engine's word code can be represented as "nmamkpo-fstelolor". Here, the first code "n" indicates that the word "engine" is a noun, and after the code representing the part of speech, the code "ma" of the main constituent word "machine", which is modified, is placed, followed by the " "Mk", which is the code of "making", is placed, and "po", which is the code of "power", is placed next to the object. And the basic word code "fstelolor" which functions as an adverb is placed after "-". Each word is represented as a code with two digits, and the word "or" at the end of the word code indicates that the "stelol" has a logical OR connection with each other.

그리고, “기술”을 "과학(sc, science)과(a, and) 공업(in, industry)에서의 지식(kn, know)"이라는 기본 단어로 나타낼 수 있으며, 위에 기술한 바와 같은 코드화 규칙에 따라 "nkn-iscinan"라는 단어 코드로 나타낼 수 있다. 여기서도 단어 코드의 맨 앞에 위치된 "n"은 "기술"의 품사가 명사임을 나타내며, 단어 코드 맨 뒤의 "an"은 "scin"이 서로 논리곱 연결 관계를 가지는 것을 나타낸다."Technology" can be represented by the basic words "kn, know" in the science (sc, science) and in (in) industry. This can be represented by the word code "nkn-iscinan". Here, "n" at the beginning of the word code indicates that the part of the word "description" is a noun, and "an" at the end of the word code indicates that "scin" has a logical AND connection with each other.

따라서 상기 문장을 단어 코드로 변경하면, "2000년(nyr)대에는 엔진(nmamkpo-fstelolor) 기술(nkn-iscinanS)이 나날이 전자(nel)화 되고(vbc) 있다.”로 나타낼 수 있다.Therefore, when the sentence is changed to a word code, it may be expressed as "nmamkpo-fstelolor technology (nkn-iscinanS) is electronicized (vbc) every day in the year 2000 (nyr)."

또한 정보를 나타내는 문장이 "미국에 있는 클린턴 대통령(npr)은 백악관(nhoofpr-ius)에서 눈코뜰새 없이 바쁘게 살고(vli) 있다." 라고 하자. 이때, "클린턴"은 고유 명사(C)이며, "대통령"이 주어(S)이다. 따라서 이러한 문장을 코드화하면 "us 클린턴(C) nprS nhoofpr-ius vli" 의 단어 코드로 나타낼 수 있다.The informational statement also states that "Clinton (npr) in the United States is busy (vli) in the White House (nhoofpr-ius)." Let's say At this time, "Clinton" is a proper noun (C), "President" is the subject (S). Therefore, if the sentence is coded, it can be represented by the word code of "us Clinton (C) nprS nhoofpr-ius vli".

이와 같이 본 발명에서는 문장의 코드화는 문장내에서 의미를 가지는 중요 단어만을 선정하여 단어 코드로 바꾸고 주제어를 구별하는 것이다. 이 경우에는 문장에서 사용하는 맞침표 같은 것을 그대로 사용하여 문장과 문장 사이를 구별 할 수 있도록 한다.As described above, in the present invention, the encoding of a sentence selects only important words having a meaning in the sentence, replaces the word with a word code, and distinguishes a main word. In this case, use the same punctuation mark used in the sentence to distinguish between the sentence and the sentence.

참고적으로 위에서, 클린턴은 고유 명사이므로 고유 명사를 나타내는 코드 "C"를 부여하였고, 따라서 고유명사는 단어 코드로 변환되지 않고 직접 원래의 단어가 사용될 수 있도록 하였다. 이와는 달리 클린턴이라는 단어에 "미국의 00대 대통령" 이라는 의미를 가진 단어 코드를 부여할 수도 있으며, 클린턴 자체에 코드를 부여할 수도 있다.For reference, in the above, since Clinton is a proper noun, a code "C" indicating a proper noun is given, so that the proper noun can be directly used without being converted into a word code. Alternatively, the word Clinton can be given a word code that means "00th President of the United States," or a code given to Clinton itself.

한편, 문장내에서 단어들의 역할에 대한 분석은 조사 분석, 단어의 형태소 분석 및 각 단어간의 연결에 의한 의미 분석, 혹은 단어의 위치 분석 등을 통해 알 수 있으며, 이러한 방법은 언어학에서 만들어진 통상의 이론에 의한다.On the other hand, the analysis of the role of words in a sentence can be found through survey analysis, morphological analysis of words and meaning analysis by linking each word, or location analysis of words. This method is a common theory made in linguistics. By

또한 프로그램에 의한 자동적인 분석은, 이 이론에 의해 만들어진 종래의 각종 단어 처리기 프로그램을 활용할 수가 있다. 실제로 문장 내의 각 단어의 역할 분석에 관한 이론은 번역 프로그램 등에 이용되고 있다.In addition, automatic analysis by a program can utilize various conventional word processor programs produced by this theory. In fact, the theory of the role analysis of each word in a sentence is used in a translation program.

다음에는 이와 같이 다수의 정보들이 코드화되어 있는 데이터 베이스를 토대로 하여 검색된 정보에, 순위를 정하는 방법에 대하여 설명한다.Next, a method of ranking the retrieved information based on a database in which a plurality of pieces of information are coded in this way will be described.

도 2a 내지 도 2e는 본 발명에 따른 검색된 정보에 순위를 정하는 방법의 흐름이 도시되어 있다.2A-2E illustrate the flow of a method of ranking the retrieved information according to the present invention.

첨부한 도 2a에 도시되어 있듯이, 입력부(11)나 인터페이스부(15)를 통하여 검색하고자 하는 검색어가 입력되면, 중앙 처리 장치(12)의 데이터 처리부(122)는 먼저, 입력되는 검색어의 단어가 2개 이상의 단어로 구성되어 있는지를 판단한다(S8100∼S8110).As shown in FIG. 2A, when a search word to be searched through the input unit 11 or the interface unit 15 is input, the data processing unit 122 of the central processing unit 12 may first enter a word of the input search word. It is determined whether the words consist of two or more words (S8100 to S8110).

데이터 처리부(122)는 입력된 검색어가 1 개의 단어인 경우에는 검색어를 해당하는 단어 코드로 변경한 다음에, 데이터 검색부(123)가 단어 코드를 토대로 단어 데이터 베이스(132)를 검색하여 해당하는 정보를 검색한다. 이때 정보 검색은 검색어의 단어 코드와 동일하거나 가장 많이 일치하는 단어 코드를 가진 정보를 검색하는 것이다.(S8130∼S8140)If the input search word is one word, the data processor 122 changes the search word to a corresponding word code, and then the data search unit 123 searches the word database 132 based on the word code and corresponds to the corresponding word code. Retrieve information. At this time, the information search is to search for information having a word code that is the same as or equal to the word code of the search word (S8130 to S8140).

검색어의 단어 코드와 동일하거나 가장 많이 일치하는 단어 코드를 가진 정보를 검색한다는 의미는 검색어의 단어 코드를 그대로 가졌거나 검색어의 단어 코드 중에서 일부만을 가진 단어 코드를 가진 정보를 검색하는 것이다.Searching for information having a word code that is the same as or most frequently matched with a word code of a search word means to search for information having a word code of the search word as it is or having a word code having only a part of the word code of the search word.

정보가 검색되면, 검색된 정보를 나열하여 각각 1 부터 n 번까지 번호를 부여한다.(S8150)If the information is found, the searched information is listed and numbered from 1 to n times, respectively. (S8150)

한편, 입력된 검색어가 2개 이상의 단어로 이루어지는 경우에는 검색어에서 주제어의 구별이 가능한가를 판단한다.(S8120).On the other hand, if the input search word consists of two or more words, it is determined whether the main word can be distinguished from the search word (S8120).

상기 주제어를 구별하는 알고리듬의 수행은 중앙 처리 장치(22)에서 정해진 프로그램에 의하여 수행한다.The algorithm for distinguishing the main words is performed by a program determined by the central processing unit 22.

검색어가 문장으로 이루어진 경우에는 주어가 주제어이고, 검색어가 문장을 이루어지지 않은 경우에는 수식을 받는 위치에 있는 단어가 주제어이다. 그리고 문장을 이루지 않고, 2 - 3 개의 단어의 나열로 되어 있을 경우에는 위치에 의하여 주제어를 선정할 수도 있으나, 주제어를 선정하지 않을 수도 있다, 이와 같이 단순 단어의 나열인 경우에는 검색어 입력자에게 주제어는 무엇인가를 화면을 통해 질문하도록 할 수 있고, 검색어에 주제어가 없는 것으로 가정하고 검색을 실시할 수도 있다.If the search word is made up of sentences, the subject is the main word. If the search word is not made up of sentences, the word at the position where the expression is received is the main word. If the words are not formed in a sentence and the two or three words are arranged, the main word may be selected according to the position, but the main word may not be selected. May ask something on the screen, and may assume that there is no topic in the search word.

예를 들어 검색어가 “엔진(nmamkpo-fstelolor) 기술(nkn-iscinanS)”이라고 했을 때, 주제어를 "기술"로 할 수 있다. 하지만 주제어가 없는 것으로 할 수도 있다. 이러한 주제어의 구별과 선정은 애매한 경우가 있게 되므로 이러한 경우에는 무작위적인 확률적 선택에 의할 수도 있다For example, if the search term is "nmamkpo-fstelolor technology" (nkn-iscinanS), the main word may be "technology". However, it can also be said to have no subject word. The distinction and selection of these subjects may be ambiguous, in which case they may be based on random probabilistic selection.

데이터 처리부(122)는 입력된 검색어가 2 개 이상의 단어이고, 주제어의 구별이 가능하면, 검색어를 해당하는 단어 코드로 변경한 다음에, 데이터 검색부(123)가 단어 코드를 토대로 단어 데이터 베이스(132)를 검색하여 해당하는 정보를 검색한다. 이때 정보 검색은 검색어의 단어 코드와 동일하거나 가장 많이 일치하는 단어 코드를 가진 정보를 검색하는 것이다.(S8160)If the input search word is two or more words, and the main word can be distinguished, the data processing unit 122 changes the search word to a corresponding word code, and then the data search unit 123 performs a word database based on the word code. 132) to search for corresponding information. At this time, the information search is to search for information having a word code that is the same as or most frequently matches the word code of the search word (S8160).

정보가 검색되면, 검색된 정보를 나열하여 각각 1 부터 n 번까지 번호를 부여한다.(S8170)If the information is found, the searched information is listed and numbered from 1 to n times, respectively. (S8170)

도 2b는 검색어가 1 개인 경우에 검색된 정보에서 순위를 정하는 순서도이다.2B is a flowchart for ranking the searched information when there is one search word.

상기 순위를 정하는 알고리듬의 수행은 각종 정보가 저장되어 있는 데이타 베이스(23)와 알고리듬이 프로그램된 중앙 처리 장치(22)에서 이루어진다.The ranking algorithm is performed in the database 23 in which various kinds of information are stored and in the central processing unit 22 in which the algorithm is programmed.

제 1번 부터 제 n 번까지 번호가 부여된 정보에서, n 순서를 가진 정보를 선택한다.(S8200)From the information numbered from 1st to nth, information having n order is selected. (S8200)

그런 다음, 정보 n 이 검색어의 단어 코드와 동일한 단어 코드를 가진 정보인가를 판단한다.(S8210) 만일 동일한 단어 코드를 가진 정보이면, 상기의 해당 정보 n 의 일치값(Cn)은, Cn = 210 이 된다.(S8220)Then, it is determined whether the information n is information having the same word code as the word code of the search word. (S8210) If the information has the same word code, the coincidence value Cn of the corresponding information n described above is Cn = 210. (S8220)

예를 들어 검색어가 "엔진(nmamkpo-fstelolor)"이라면, 엔진의 단어 코드와 동일한 단어 코드를 가진 정보이면, 해당 정보의 일치값은 210 이 되는 것이다.For example, if the search word is "engine (nmamkpo-fstelolor)", if the information having the same word code as the word code of the engine, the corresponding value of the information is 210.

검색어의 단어 코드와 동일한 정보가 아니면, 검색어의 주 구성 단어 코드와 동일한 주 구성 단어 코드를 가진 정보인가를 판단한다.(S8230)If it is not the same information as the word code of the search word, it is determined whether the information has the same main constituent word code as the main constituent word code of the search word (S8230).

검색어가 "엔진(nmamkpo-fstelolor)"일 경우, 주 구성 단어 코드는 "ma"로서, ma는 기계를 나타내는 기본 단어 코드이다. 즉, "ma"를 주구성 단어 코드로 가진 단어 코드를 가진 정보인가를 판단하는 것이다.If the search term is "engine (nmamkpo-fstelolor)", the main constituent word code is "ma", where ma is the base word code representing the machine. That is, it is determined whether the information has a word code having "ma" as the main constituent word code.

검색어의 주구성 단어 코드를 가진 정보가 아니라면, 즉 상기 검색어의 예에서 보면, "ma"를 주 구성 단어 코드로 가진 정보가 아니라면, 구성 단어 코드가 일치하는 정도에 따른 일치값을 계산하여 이를 Vn이라고 하고, 상기 해당 정보의 일치값(Cn)값을 Vn 값으로 한다.(S8240)If the information does not have the main constituent word code of the search term, that is, in the example of the above search term, if the information does not have "ma" as the main constituent word code, then it calculates a match value according to the degree to which the constituent word code matches, and Vn. In step S8240, the coincidence value Cn of the information is referred to as Vn.

여기서 구성 단어 코드의 일치값을 계산하는 방법을 설명하겠다. 검색어의 구성 단어 코드를 모두 가지고 있는 정보일 경우에는 일치값을 100 으로 한다. 따라서 구성 단어 코드가 k 개 라면, 구성 단어 코드 1 개가 일치하게 되면, 일치값 Vn은 100/k 가 된다. 따라서 2개의 구성 단어 코드가 일치한다면, 구성 단어 코드의 일치도에 따른 일치값 Vn은 100/k의 2 배가 되는 것이다. 다시 말해서, 구성 단어 코드가 4 개이고 그중에서 2 개가 일치하면, Vn 의 계산식은 "Vn=100/4 X 2"가 되므로, Vn 의 값은 50 이 되게 된다.This section describes how to calculate the match value of a constituent word code. If the information contains all the constituent word codes of the search word, the match value is set to 100. Therefore, if there are k configuration word codes, if one configuration word code matches, the matching value Vn becomes 100 / k. Therefore, if the two constituent word codes match, the coincidence value Vn according to the coincidence of the constituent word codes is doubled to 100 / k. In other words, if there are four constituent word codes and two of them match, the formula for Vn becomes " Vn = 100/4 × 2 ", so that the value of Vn becomes 50.

한편, 검색어의 주 구성 단어 코드를 주 구성 단어 코드로 가진 정보일 경우에는 검색어의 구성 단어 코드를 가지고 있는가를 판단한다.(S8250)On the other hand, in the case of information having the main constituent word code of the search word as the main constituent word code, it is determined whether the constituent word code of the search word is present (S8250).

정보 n이 검색어의 구성 단어 코드를 가지고 있지 않으면, 정보 n의 일치값(Cn)은, Cn = 100 이 된다.(S8270)If the information n does not have a constituent word code of the search word, the coincidence value Cn of the information n is Cn = 100 (S8270).

즉, 검색어가 "엔진(nmamkpo-fstelolor)"일 경우 검색된 정보가 주 구성 단어 코드로 "ma" 를가지고 있으나, 다른 구성 단어 코드 "mk, po, st, el, ol" 를 하나도 가지고 있지를 않으면, 상기 해당 정보 n 의 일치값(Cn)은 100이 된다.That is, if the search term is "engine (nmamkpo-fstelolor)", the searched information has "ma" as the main constituent word code, but does not have any other constituent word codes "mk, po, st, el, ol". The coincidence value Cn of the corresponding information n is 100.

이때, 엔진의 단어 코드에서, 주구성 단어 코드를 제외하면, "mk,po,-f, st, el, ol, or" 이 된다. 하지만 "-f"는 ∼로부터를 의미하는 단어 코드이고 "or"은 논리합을 나타내는 단어 코드이므로, 이러한 불용어는 생략하여 구성 단어 코드에서 제외하게 된다. 따라서 "엔진(nmamkpo-fstelolor)"의 단어 코드에서, 구성 단어 코드는 "mk,,po, st, el, ol" 이 된다.At this time, in the engine word code, except for the main constituent word code, it is "mk, po, -f, st, el, ol, or". However, since "-f" is a word code meaning "from" and "or" is a word code indicating a logical OR, such a stop word is omitted from the constituent word code. Thus, in the word code of "nmamkpo-fstelolor", the constituent word code is "mk ,, po, st, el, ol".

또한, 검색어의 주구성 단어 코드를 가지고 있으면서, 검색어의 구성 단어 코드를 가지고 있는 단어 코드를 가진 정보라고 한다면, 구성 단어 코드가 일치하는 정도에 따라, 구성 단어 코드의 일치값 Vn을 구한다. 그리고 이때 n 정보의 일치값(Cn)은 다음과 같이 된다.Further, if the information has a word code that has the main word code of the search word and has the word code of the word of the search word, the matching value Vn of the word of the structure word is obtained according to the degree to which the word of the word of the word matches. At this time, the coincidence value Cn of the n information is as follows.

Cn = 100 + Vn (S8260)Cn = 100 + Vn (S8260)

즉, 검색어가 "엔진(nmamkpo-fstelolor)"이라면, 검색된 정보의 단어 코드가 주구성 단어 코드로 "ma"를 가지고 있고, 다른 구성 단어 코드 "mk,,po, st, el, ol" 중에 일부 혹은 전부를 가지고 있으면, 구성 단어 코드의 일치하는 정도에 따른 일치값 Vn 을 구한 다음, 여기에 100 을 더한 값이 해당 정보 n 의 일치값(Cn)이 되게 된다.That is, if the search term is "engine (nmamkpo-fstelolor"), the word code of the searched information has "ma" as the main constituent word code, and some of the other constituent word codes "mk ,, po, st, el, ol" Or if it has all, the coincidence value Vn according to the degree of coincidence of the constituent word codes is obtained, and then 100 is added to be the coincidence value Cn of the information n.

상기와 같은 방식으로 마직막 검색된 정보에 대하여, 일치값을 모두 구하면(S8275), 일치값 Cn의 크기에 따라 검색된 정보의 순위를 나열한다. (S8280)With respect to the information finally retrieved in the above manner, when all the matching values are found (S8275), the ranking of the retrieved information is listed according to the size of the matching value Cn. (S8280)

이렇게 순위가 나열되면, 검색된 정보와 가장 일치하는 순서대로 정보가 나열되는 것이다.If the rankings are listed in this way, the information is listed in the order that most closely matches the searched information.

즉, 검색어의 단어 코드와 동일한 단어 코드, 검색어의 단어 코드를 포함하는 단어 코드를 가진 정보, 검색어의 단어 코드와 주구성 단어 코드가 동일하면서 구성 단어 코드도 일부 동일한 정보의 순으로 우선 순위가 정해지게 된다.That is, the word code that is identical to the word code of the search word, the information having the word code including the word code of the search word, the word code and the main constituent word code of the search word are the same, and the constituent word codes are prioritized in order of the same information. You lose.

그리고, 검색어의 주구성 단어 코드와 동일한 주구성 단어 코드를 가졌지만 검색어의 구성 단어 코드를 하나도 가지고 있지 않은 정보와, 검색어의 주구성 단어 코드는 가지고 있지 않지만 검색어의 구성 단어 코드를 구성 단어 코드로 모두 가지고 있는 정보는 우선 순위가 서로 같게 된다. 두 경우 모두 일치값이 100으로 같기 때문이다.Information that has the same main word code as the main word code of the search word but does not have any of the main word codes of the search word, and the constituent word code of the search word as the constituent word code All information has the same priority. This is because the coincidence value is equal to 100 in both cases.

그 다음 우선 순위로는, 검색어의 주구성 단어 코드를 주구성 단어 코드로 가진 정보, 검색어의 구성 단어 코드를 구성 단어 코드로 일부 가지고 있는 정보가 된다.The next priority is information having the main constituent word code of the search word as the main constituent word code, and information having a part of the constituent word code of the search word as the constituent word code.

도 2c와 도 2d는 검색어의 단어가 2개 이상이고, 검색어에서 주제어의 구별이 가능하지 않을 경우, 검색된 정보의 순위를 정하는 순서도이다.2C and 2D are flowcharts for ranking the searched information when there are two or more words in a search word and a main word cannot be distinguished from the search word.

여기서 검색어가 2 개 이상인 경우에는 동일 문장 내에서의 판단을 의미한다. 예를 들어 검색어가 "엔진 기술"이라고 한다면, 검색어와 동일한 단어 코드를 가진 정보가 의미하는 바는, "엔진"과 "기술"의 단어를 동일 문장내에서 가지고 있어야 함을 의미한다. 즉, "엔진"의 단어를 가지고 있는 문장과 "기술"의 단어를 가지고 있는 문장이 다를 경우는 검색어와 동일한 정보가 되는 것이 아니다. 다시 말해서 "엔진"과 "기술"이라는 단어를 A 라는 한개의 문장에서 가지고 있다면, A 라는 문장을 가지고 있는 정보가 검색어와 동일한 정보가 되는 것이다.Here, when there are two or more search terms, this means determination within the same sentence. For example, if the search word is "engine technology", the information having the same word code as the search word means that the words "engine" and "technology" must be included in the same sentence. That is, when a sentence having a word of "engine" and a sentence having a word of "technology" are different, it is not the same information as a search word. In other words, if you have the words "engine" and "technology" in one sentence A, then the information with the sentence A is the same information as the search term.

먼저, 1 부터 n까지 임의의 순서가 부여된 검색된 정보에서, n 순서를 가진정보를 선택하고, 선택된 정보가 검색어의 단어 코드와 동일한 단어 코드를 1 개 이상 가졌는가를 판단한다.(S8300∼S8310)First, from the searched information given any order from 1 to n, information having n order is selected, and it is determined whether the selected information has one or more word codes identical to the word code of the search word (S8300 to S8310).

예를 들어 입력 단어의 수가 2 개이면, 입력 단어의 단어 코드를 1 개에서 2 개까지 가진 정보를 의미한다. 즉, 검색어가 “엔진(nmamkpo-fstelolor) 기술(nkn-iscinanS)"이라고 한다면, 검색된 정보의 단어 코드가 "엔진"의 단어 코드와 "기술"의 단어 코드 중에서 1 개 혹은 전부의 단어 코드를 가진 경우이다.For example, if the number of input words is two, it means information having one to two word codes of the input words. In other words, if the search term is "nmamkpo-fstelolor technology" (nkn-iscinanS), the word code of the searched information has one or all of the word codes of the word code of "engine" and the word code of "technology". If it is.

n 순서를 가진 정보가 가진, 입력 단어의 단어 코드와 동일한 단어 코드의 수를 k 라고 한다면, n 정보의 일치값(Cn)은, Cn = k X 210 이 된다.(S8320)If the number of word codes equal to the word code of the input word possessed by the information with n order is k, then the coincidence value Cn of the n information becomes Cn = k X 210 (S8320).

그런 다음, n 정보의 검색어의 단어 코드와 동일한 단어 코드를 제외한 단어 코드 중에서, 검색어의 주구성 단어 코드를 주구성 단어 코드로 가진 정보가 1 개이상인가를 판단한다.(S8330)Then, it is determined whether one or more pieces of information having the main constituent word code of the search word as the main constituent word code among the word codes except for the same word code as the word code of the search word of the n information (S8330).

예를들어, 검색어가 “엔진(nmamkpo-fstelolor) 기술(nkn-iscinanS)"이라고 한다면, 검색된 정보 중에서, 단어 코드가 "엔진"과 "기술"의 단어 코드와 동일하지 않은 단어 코드에서, 검색어의 주구성 단어 코드를 주구성 단어 코드로 가진 정보가 있음을 판단하는 것이다. 상기 예에서는 주구성 단어 코드로 "ma"와 "kn"을 가진 단어코드가 1 개 이상인가를 판단하는 것이다.For example, if the search term is "nmamkpo-fstelolor technology" (nkn-iscinanS), then in the searched information, in the word code where the word code is not the same as the word code of "engine" and "technology", In the above example, it is determined whether there is more than one word code having "ma" and "kn" as the main constituent word codes.

만일 n 정보가 1개 이상의 주구성 단어 코드를 가지고 있고, 이 주구성 단어 코드의 수를 j 라고 한다면, n 정보의 일치값은 다음과 같게 된다.If n information has one or more major component word codes, and the number of the major component word codes is j, the matching value of n information is as follows.

Cn = Cn + j X 100 (S8340)Cn = Cn + j X 100 (S8340)

즉, 8320 단계의 일치값에, 검색어의 주구성 단어 코드와 일치하는 주구성단어 코드의 수를 고려한 일치값을 더한 수가 n 정보의 일치값이 된다.That is, the number of coincidences of n information is obtained by adding the coincidence value in step 8320 to the coincidence value in consideration of the number of principal component word codes matching the principal component word code of the search word.

그리고, 검색어의 구성 단어 코드와 일치하는 구성단어 코드도 가지고 있다면, 구성 단어 코드를 고려한 일치도 Vn을 계산하여, 일치값을 계산하게 된다. 이때의 일치값은 다음과 같게 된다.In addition, if there is a constituent word code that matches the constituent word code of the search word, the coincidence degree Vn considering the constituent word code is calculated to calculate a match value. The coincidence value at this time is as follows.

Cn = Cn + Vn (S8345)Cn = Cn + Vn (S8345)

즉, 8340 단계에서 구한 일치값에 Vn의 값을 더하게 된다. 상기에서 구성 단어 코드의 일치도를 고려한 일치도는 도 2b의 설명의 예와 같다.That is, the value of Vn is added to the coincidence value obtained in step 8340. In the above, the degree of coincidence in consideration of the coincidence of the constituent word codes is the same as the example of the description of FIG. 2B.

상기 검색어의 예에서 본다면, 검색된 정보가 주구성 단어 코드로 "ma"와 "Kn"을 가지고 있고, 구성 단어 코드로 "mk, po, st, el, ol, sc, in" 중에 전체 혹은 일부를 가지고 있는 경우에 해당된다.In the example of the above search word, the searched information has "ma" and "Kn" as main constituent word codes, and all or part of "mk, po, st, el, ol, sc, in" as constituent word codes. If you have.

n 정보의 검색어의 단어 코드와 동일하지 않은 단어 코드 중에서, 검색어의 주구성 단어 코드를 주구성 단어 코드로 가진 단어 코드를 가지고 있지를 않다면, 구성 단어 코드의 일치도 Vn 을 구한다. 따라서 정보 n 의 일치값(Cn)은 다음과 같게된다.Among the word codes that are not the same as the word code of the search word of the information, if there is no word code having the main constituent word code of the search word as the main constituent word code, the coincidence degree Vn of the constituent word codes is obtained. Therefore, the coincidence value Cn of the information n becomes as follows.

Cn = Cn + Vn (S8350)Cn = Cn + Vn (S8350)

즉, 8320 단계에서 구한 일치값에 8350 단계에서 구한 일치도 Vn 을 더한 값이 정보 n의 일치값(Cn)이 된다.That is, the value obtained by adding the coincidence degree Vn obtained in step 8350 to the match value obtained in step 8320 becomes the match value Cn of the information n.

n 정보가 검색어의 단어 코드와 동일한 단어 코드를 1 개 이상 가지고 있지 않을 경우에, 검색된 정보의 순위를 정하는 순서도는 도 2d에 나타나 있다.When n information does not have one or more word codes identical to the word code of the search word, a flowchart for ranking the searched information is shown in FIG. 2D.

검색어의 주구성 단어 코드와 동일한 주구성 단어 코드를 1 개 이상 가지고있는 가를 판단하고, 일치하는 주구성 단어 코드의 수에 의하여 100 의 값을 곱하고, 여기에 검색어의 구성 단어 코드의 일치도를 고려하여 정보 n의 일치값을 구하게 된다. 그리고 검색어의 주구성 단어 코드를 가지고 있지 않으면, 검색어의 구성 단어 코드의 일치도를 구한 다음에 n 정보의 일치값을 구하게 된다.(S8360∼S8380)Determines whether one or more main word codes that are identical to the main word code of a search term are multiplied, multiplied by a value of 100 by the number of matching main word codes, and taking into account the degree of match of the constituent word codes of the search word. The coincidence value of the information n is obtained. If the main search word code of the search word is not provided, the coincidence degree of the search word of the search word is calculated, and then a matching value of n information is obtained. (S8360 to S8380)

8360 단계에서 8380 단계의 과정은, 8330 단계에서 8350 단계와 원리가 같으므로 여기서는 자세한 설명은 생략하겠다.Since the processes of steps 8360 to 8380 are the same as those of step 8330 to 8350, detailed descriptions thereof will be omitted.

상기와 같이 1부터 n까지 임의의 순서가 부여된 검색된 정보의 일치값을 모두 구한 다음(S8355), 각각의 정보가 가진 일치값(Cn)의 크기의 순서에 따라 검색된 정보의 순위를 정하여 나열한다.(S8390)As described above, all matching values of the searched information given any order from 1 to n are obtained (S8355), and then the ranks of the searched information are listed according to the order of the magnitude of the matching value (Cn) of each information. (S8390)

일치값의 순위에 따라 검색된 정보가 나열되면, 검색어와 동일한 단어 코드를 가진 정보를 시작으로 검색된 정보와 가장 일치하는 순서로 검색된 정보가 나열되는 것이다.When the searched information is listed according to the rank of the match value, the searched information is listed in the order that most closely matches the searched information, starting with information having the same word code as the search word.

도 2e는 검색어의 단어가 2 개 이상이고 주제어이 구별이 가능한 경우, 검색된 정보의 순위를 정하는 순서도이다.FIG. 2E is a flowchart for ranking the searched information when there are two or more words in a search word and subject words can be distinguished.

검색어에서 주제어가 선택되면, 주제어의 단어 코드와 동일하거나 가장 많이 일치하는 단어 코드를 가진 정보를 검색한다.(S8400)When the main word is selected in the search word, information having a word code that is the same as or most frequently matches the word code of the main word is searched for (S8400).

예를들어 검색어가 “1차 대전(nwawofi) 기간(nti-obeenan)의 미국(nusS)”인 경우에 역할 코드를 부여하여 코드화하면, "nwawofiA nti-obeenanA nusS"의 단어 코드로 나타낼 수 있다.For example, when a search word is “Nwawofi” (nti-obeenan) of the United States (nusS), if a role code is assigned and coded, it may be represented by a word code of “nwawofiA nti-obeenanA nusS”.

이때, 주제어는 미국이므로, 미국이라는 단어의 단어 코드 "us"와 동일하거나 가장 많이 일치하는 단어 코드를 주제어로서 가진 정보를 검색하는 것이다.At this time, since the main word is the United States, the main word is to search for information having the word code equal or most identical to the word code " us "

여기서, 1 차(The first, fi) 세계(world, wo) 대전(war, wa)의 단어 코드는 "nwawofi" 이고, 기간의 단어 코드는 기간을 설명하는 단어인 “time(ti) of(-o) a beginning(be) and(an) end(en)”로 나타내어 "nti-obeenan"이 된다. 물론 맨 마즈막의 "an"은 "and"를 의미하고, 맨 앞의 한자리 코드는 단어의 품사를 나타낸다.Here, the word code of the first, fi world, wo, war, wa is "nwawofi", and the word code of the period is "time (ti) of (- o) a beginning (be) and (an) end (en) ”to be“ nti-obeenan ”. Of course, the last "an" means "and", and the first single-digit code represents the part of speech of the word.

또 다른 예를 들어서 검색어가 "증류탑에서 펌프의 모터"라고 한다면, 검색어의 단어 코드는 "ntwmk(gs-flq)(lq-fgs)or nma=pomvlqgsor nmamkmv-fpoS" 이 된다. 그리고 주제어는 "모터"이므로, 모터의 단어 코드에 주제어를 나타내는 코드인 "S"를 부여하였다.In another example, if the search term is "motor of a pump in a distillation tower", the word code of the search term is "ntwmk (gs-flq) (lq-fgs) or nma = pomvlqgsor nmamkmv-fpoS". Since the main word is "motor", the word code of the motor is given "S" which is a code representing the main word.

또한 본 발명에서의 설명의 편의를 돕고자, 검색어에서 주제어를 제외한 다른 단어를 "일반어"라고 하겠다. 따라서 검색어가 상기와 같이 "증류탑에서 펌프의 모터"라고 한다면, 일반어는 "증류탑"과 "펌프"가 된다.In addition, to help the convenience of description in the present invention, other words except for the main term in the search term will be referred to as "general words". Therefore, if the search term is "motor of a pump in a distillation tower" as described above, the general words are "distillation tower" and "pump".

이때, 도 6의 8400 단계는 검색어 단어 코드 중에서, "nmamkmv-fpoS" 와 동일하거나 가장 많이 일치하는 단어 코드를 주제어로서 가진 정보를 검색하는 것이다.In this case, step 8400 of FIG. 6 is to search for information having a word code equal to or most identical to "nmamkmv-fpoS" among the search word word codes.

한편,“증류탑”은“액체(liquid, lq)를 가스(gas, gs)로 만들(make, mk)거나 가스를 액체로 만드는 탑(tower, tw)”으로서 이를 기본으로 단어 코드를 만든다면, "ntwmk(gs-flq)(lq-fgs)or" 가 된다. 물론“증류탑”이 사용되는 화학 공업 분야에서 “증류탑”은 기본 공정이 되므로, 화학 공업 분야에서“증류”는 기본 단어가 된다. 따라서 화학 공업 분야의 기본 단어“증류(distillation, ds)”의 단어 코드를 사용하여 “증류탑”의 단어 코드를 만들 수 있다. 이 경우 "증류탑"의 단어 코드는 "cindstw" 이 된다. 단어 앞에 "ci"는 화학 공업(chemical Industry)를 나타내는 영역 코드이며, "n"은 명사 코드이다.On the other hand, "distillation tower" is "to make liquid (lq) into gas (gas, gs) (to make (mk)) or gas into liquid (tower, tw)", if you make a word code based on this, It becomes "ntwmk (gs-flq) (lq-fgs) or". Of course, in the chemical industry where the "distillation tower" is used, "distillation tower" is the basic process, so in the chemical industry "distillation" is the basic word. Thus, the word code of the basic word "distillation (ds)" in the chemical industry can be used to produce the word code of the "distillation tower". In this case, the word code for "distillation tower" is "cindstw". "Ci" before the word is an area code indicating a chemical industry, and "n" is a noun code.

따라서 증류탑의 단어 코드는 단어 의미를 가지고 만든 단어 코드 "ntwmk(gs-flq)(lq-fgs)or"와 화학 공업 영역 분야의 단어로서 만든 단어 코드 "cindstw" 2가지가 있게 된다.Thus, there are two word codes of the distillation column: the word code "ntwmk (gs-flq) (lq-fgs) or" made with word meanings and the word code "cindstw" made as words in the chemical industry.

또한, “펌프”의 단어 의미는 액체(liquid, lq)나 기체(gas, gs)를 힘(power)으로 움직(move, mv)이는 기계(machine, ma)이므로, 이를 가지고 단어 코드를 만들면, "nma=pomvlqgsor" 가 된다. 이 단어 코드에서 "po"의 기본 단어 코드가 "mv"의 기본 단어 코드 앞에 위치한 이유는 "po"가 "mv"를 수식하는 역할을 하기 때문이다. 즉 "po" 단어와 "mv" 단어가 합쳐져서 "ma" 단어를 수식하는 역할을 한다는 의미이다.Also, the word meaning of "pump" is a machine (ma) that moves liquid (lq) or gas (gas, gs) with power (machine, ma). "nma = pomvlqgsor" In this word code, the base word code of "po" is placed before the base word code of "mv" because "po" is responsible for modifying "mv". That is, the word "po" and the word "mv" are combined to serve to modify the word "ma".

그리고, “모터”의 단어 의미는“전기(electricity, el) 등의 힘(power, po)으로 움직(movement, mv)임을 만드는(make, mk) 기계(machine, ma)”이다. 따라서,“모터”의 단어 코드는 "nmamkmv-fpo"가 된다. 상기 단어에서 "mk" 단어 뒤에 "mv" 단어가 위치하는 이유는 "mv" 단어가 "mk"단어의 목적어 역할을 하고 수식어 역할을 하지 않기 때문이다." 한편 단어 코드 앞에 부여된 "n"은 명사를 나타내는 단어 코드이다.And the word meaning of "motor" is "make (mk) machine (ma)," which is a movement (mv) with power (po) of electricity (el, etc.). Therefore, the word code of "motor" becomes "nmamkmv-fpo". The word "mv" is placed after the word "mk" in the word because the word "mv" serves as the object of the word "mk" and does not act as a modifier. " A word code that represents a noun.

검색한 정보를 나열하고, 검색된 정보 각각에 1 번부터 n 번까지 임의의 번호를 부여하고, 도 2b의 알고리듬을 이용하여, n 정보의 검색어의 주제어에 대한일치값 Cn 을 구한다.(S8410 ∼ S8420)The searched information is listed, a random number is assigned to each searched information from 1 to n times, and the matching value Cn for the main word of the search word of n information is obtained using the algorithm of FIG. 2B. (S8410 to S8420) )

즉, 입력 단어가 "증류탑에서 펌프의 모터" 라고 한다면, 8400 단계에서 검색된 정보는 모터의 단어 코드 "nmamkmv-fpo"와 동일하거나 최소한 일부가 일치하는 단어 코드를 주제어로 가진 정보를 검색한 것이다. 따라서, n 정보의 주제어에 대한 일치값을 구하는 것은, 단어 코드 "nmamkmv-fpo" 와 n 정보의 주제어와의 일치값을 구하는 것이다. 8400 단계에서 검색된 정보는 "nmamkmv-fpo" 의 단어 코드와 최소한 일부가 일치하는 단어 코드를 가진 정보를 검색한 것이기 때문에, 주제어에 대한 일치값을 구하는 것이 가능하다.That is, if the input word is "the motor of the pump in the distillation tower", the information retrieved in step 8400 is to search for information with the main word of the word code that is the same or at least partly match the word code of the motor "nmamkmv-fpo". Therefore, to obtain a coincidence value with respect to the main term of n information, the coincidence value of the word code "nmamkmv-fpo" and the main term of n information is calculated | required. Since the information retrieved in step 8400 is a search for information having a word code that at least partially matches the word code of "nmamkmv-fpo", it is possible to obtain a matching value for the main word.

도 2b의 순서도는 검색어가 1 개일 경우에 일치값을 구하는 알고리듬이므로, 주제어 단어 1 개에 대한 일치값을 구하는 방법에 도 2b의 알고리듬을 적용할 수가 있는 것이다.Since the flowchart of FIG. 2B is an algorithm for obtaining a coincidence value when there is only one search word, the algorithm of FIG. 2B can be applied to a method for obtaining a coincidence value for one main word.

검색된 n 정보의 주제어에 대한 일치값 Cn을 구한 다음, 검색어에서 일반어의 개수를 C 라고 한다면, 일치값(Cn)은, Cn = Cn x C 가 된다.(S8430).If the match value Cn of the searched n-information word is obtained, and the number of general words is C in the search word, the match value Cn becomes Cn = Cn x C (S8430).

즉, 8420 단계에서 구한 일치값에 검색어의 일반어의 갯수를 곱한 값이 8430 단계의 일치값이 된다. 다시 말해서 검색된 n 정보의 주제어에 대한 일치값이 210이고, 검색어의 일반어가 상기와 같이 2 개이라면, 8430 단계에서 검색어 n 의 일치값은 420(210 x 2 = 420)이 된다.That is, the value obtained by multiplying the match value obtained in step 8420 by the number of general words of the search word becomes the match value in step 8430. In other words, if the match value of the searched n information for the main word is 210 and the general words of the search word are two as described above, the match value of the search word n becomes 420 (210 × 2 = 420) in step 8430.

그런 다음 검색어의 일반어의 단어 코드와 n 정보의 일반어의 단어 코드를 비교하여, 일치하는 구성 요소 단어 코드가 존재하는가를 판단한다(S8440∼8450)Then, the word code of the general word of the search word and the word code of the general word of n information are compared to determine whether there is a matching component word code (S8440 to 8850).

예를 들어 검색된 정보가 "액체와 기체 이동용 모터" 라고 한다면, 검색어의일반어인 "증류탑 펌프"의 단어 코드와 검색된 정보의 일반어인 "액체와 기체 이동"의 단어 코드"nlq nga nmv"를 비교하여, 서로 일치하는 구성 요소 단어 코드가 있는 가를 판단하는 것이다.For example, if the searched information is "liquid and gas moving motor", the word code of the general term "distillation tower pump" of the search term is compared with the word code "nlq nga nmv" of the general word "liquid and gas movement". This is to determine if there is a matching component word code.

만일 일치하는 구성 요소 단어 코드가 존재한다면, 도 3,4,5 의 알고리듬을 이용하여. 검색어의 일반어와 n 정보의 일반어의 일치도 Vn 을 구한다.(S8460)If there is a matching component word code, use the algorithms of Figs. The degree of coincidence Vn between the general word of the search term and the general word of n information is obtained (S8460).

상기 검색어의 예에 의하면, 검색된 정보의 일반어 "액체와 기체 이동"의 단어 코드와 검색어의 일반어 "증퓨탑 펌프"의 단어 코드의 일치도를 구하는 것이다.According to the example of the search word, the degree of correspondence between the word code of the general word "liquid and gas movement" of the searched information and the word code of the general word "amplifier tower pump" of the search word is obtained.

그러면, 정보 n 의 일치값 Cn 은,Then, the coincidence value Cn of the information n is

Cn = Cn + Vn 이 된다.(S8470)Cn = Cn + Vn (S8470)

즉, 8430 단계에서 구한 일치값(Cn)에 8460 단계에서 구한 일치도(Vn)를 더한 값이, n 정보의 최종적인 일치값 Cn이 되는 것이다.That is, the value obtained by adding the coincidence value Vn obtained in step 8460 to the coincidence value Cn obtained in step 8430 becomes the final match value Cn of the n information.

그리고, 검색어의 일반어와 검색된 정보의 일반어 사이에 일치하는 구성 요소 단어 코드가 존재하지 않는다면, n 정보의 일치값은 8430 단계에서 정해진 일치값 Cn 이 된다.If there is no component word code that matches between the general word of the search word and the general word of the searched information, the matching value of the n information becomes the matching value Cn determined in step 8430.

상기와 같이 단어 코드가 2 개 이상이면서 주제어가 존재하는 검색어에 대한 검색된 정보의 일치값을 모두 구한 다음(S8475), 일치값 Cn의 크기에 따라 정보의 순위를 나열한다.(S8480)As described above, all the matching values of the searched information for the search word having two or more word codes and the main word are obtained (S8475), and the ranking of the information is listed according to the size of the matching value Cn. (S8480)

일치값이 큰 정보일수록 검색어와 가까운 정보이므로, 일치값의 크기에 따라 정보의 순위가 나열되면, 검색어와 가까운 정보의 순으로 나열되는 것과 같은 효과가 있게 된다.Since the larger the match value, the closer the information is to the search word, if the rank of the information is listed according to the size of the match value, the same effect as the order of the information close to the search word is obtained.

상기와 같은 알고리듬에 의하면, 검색된 정보의 우선 순위는 다음과 같게 된다.According to the above algorithm, the priority of the retrieved information becomes as follows.

검색어와 동일한 정보, 검색어를 포함하고 있는 정보, 검색어의 주제어와 동일하고 검색어의 일반어의 단어 코드를 일부 가지는 정보의 순으로 우선 순위를 가지고, 그다음으로 검색어의 주제어만 동일하거나 검색어의 일반어만 모두 동일한 정보는 서로 우선 순위가 같고, 다음 우선 순위로는 검색어의 주제어와 일부 동일하거나 검색어의 일반어와 일부 동일한 정보의 순으로 우선 순위를 가지게 된다.Priority in order of the same information as the search term, information containing the search term, information that is the same as the subject term of the search term, and contains some of the word codes of the general term of the query term, followed by only the subject term of the query term or only the general term of the term term. All of the same information has the same priority, and the next priority has the same order of information as the main word of the search word or part of the same information as the general word of the search word.

본 발명은 다음의 기술되는 청구 범위를 벗어나지 않는 범위내에서 다양한 변경 및 실시가 가능하다.The invention is susceptible to various modifications and implementations without departing from the scope of the following claims.

이상에서와 같이 본 발명의 실시예에 따라, 개념을 이용하여 정보를 검색하는 검색 시스템 및 그 방법에서, 검색어와 검색된 정보 사이에 일치값을 계산하여, 검색된 정보의 순위를 보다 더 용이하게 정할 수 있는 효과를 제공한다.As described above, according to an embodiment of the present invention, in a search system and method for searching for information using a concept, it is possible to more easily determine the rank of the searched information by calculating a match value between the search word and the searched information. Provide the effect.

이렇게 정보의 순위를 정하므로서, 검색된 정보에서 검색어와 동일하거나 가장 많이 일치하는 정보를 선택할 수가 있어, 검색의 효과를 더욱 더 향상 시킬 수가 있게 된다.By ranking the information in this way, it is possible to select the same or most matching information from the searched information in the searched information, it is possible to further improve the effectiveness of the search.

Claims

In this way, all words representing information are classified into basic words and compound words, and a search term is encoded according to a set rule, and a database in which a search target to be searched is stored based on the coded search terms is the same as the word code. In the method for searching the most matching information and ranking the retrieved information,

Distinguishing the same information as the word code of the search word, information having the main constituent word code of the search word as the main constituent word code, and information having the constituent word code of the search word as the constituent word code and assigning different matching values to each one;

And ranking each of the retrieved information using the given match value.

In the step of giving a matching value of claim 1,

When the word of the input search word consists of two or more, and the main word can be distinguished from the search word,

And distinguishing the main word and the general word of the search word, assigning different matching values to the main word and the general word, and ranking the searched information.

In the step of giving a matching value of claim 1,

Distinguish different match values by distinguishing between information having a word code identical or most identical to a word code of a main word of the search word and information having a component word code matching a component word code of a general word of the search word. An information retrieval method characterized by the provision.

In the step of giving a matching value of claim 1,

Information that has the same main word code as the main word code of the search word but does not have any of the main word codes of the search word, and does not have the main word code of the search word but constitutes the word code of the search word The information retrieval method, characterized in that the priorities are all equal to each other.

In the step of giving the coincidence value of claim 1,

Priorities in the order of the same information as the search word, information containing the search word, information that is the same as the main word of the search word and has a part of a word code of a general word of the search word. The same information is the same as the main word or only the general word of the search word, and the same priority is given. The next priority has the same priority as the main word of the search word or the same information as the general word of the search word. Information retrieval method.

The method according to any one of claims 1 to 6,

And order the rank of the searched information according to the order of the magnitude of the matching value.