KR20000036758A

KR20000036758A - A method for establishing database for searching files and a method for searching file by use of the database

Info

Publication number: KR20000036758A
Application number: KR1020000015903A
Authority: KR
Inventors: 이세룡
Original assignee: 이세룡
Priority date: 2000-03-28
Filing date: 2000-03-28
Publication date: 2000-07-05
Also published as: KR100341418B1

Abstract

PURPOSE: A method for searching texts with use of a search database of texts is provided, that a searcher using a computer predicts texts to be used in searching, extracts the texts, and divides the extracted texts into parts according to a meaning, to extract a predicted searched text as a sentences. The method is provided, that the searcher establishes a search database by making a the texts as a database by use of the extracted text. The method is provided, that the searcher divides sentences expressed a search propose directly, into parts according to the meaning, and searches the texts through the established database by inputting the texts as the sentences. The method is provided, that the searcher searches the inputted texts, to search texts of the search database by use of the searched predicted texts. CONSTITUTION: A method for searching texts by use of a search database of texts contains processes. The first process is to input basic information to texts. The second process is to extract a predicted text from an inputted corresponded text, to divide the extracted text into parts according to a meaning, and to input the predicted text as a sentence, which is composed of many parts. The third process is to convert the inputted data into an inner data type. The fourth process is to input the converted data in a search database, as a record type proper each field of the database.

Description

A method for establishing database for searching files and a method for searching file by use of the database}

본 발명은 문서자료 검색 데이터베이스화 및 이를 이용한 문서자료 검색 방법으로서, 컴퓨터를 이용하여 문서자료를 검색하는 자가 검색시에 이용할 만한 검색문을 미리 예상해 검색문을 추출하여 의미에 따라 부분으로 나누어 복수개의 부분으로 구성되는 문장형태로 예상검색문을 추출하고, 추출된 예상검색문을 이용하여 문서자료를 데이터베이스화하여 검색데이터베이스를 구축하고, 검색자는 검색하고자 하는 의도를 직접적으로 나타낼 수 있는 문장을 의미에 따라 부분으로 나누어 복수개의 부분으로 구성되는 문장형태로 검색문을 입력하여 구축된 검색데이터베이스를 통하여 문서자료를 검색하도록 하며, 또한 검색자가 검색데이터베이스에 입력되어 있는 예상검색문을 검색하여 검색된 예상검색문을 이용하여 검색데이터베이스의 문서자료를 검색할 수 있도록 하여 사용이 용이하면서 신속하고 효율적인 검색이 되도록 한 문서자료 검색 데이터베이스화 및 이를 이용한 문서자료 검색 방법에 관한 것이다.The present invention provides a document data search database and a document data search method using the same, wherein a person searching for document data using a computer predicts a search sentence that can be used at the time of the search in advance and extracts a search sentence into a plurality of parts according to meaning Extracts the expected search sentence in the form of sentences composed of parts, builds a search database by documenting the document data using the extracted predicted search sentence, and the searcher has a sentence that can directly indicate the intention to search. According to the search results in the form of a sentence composed of a plurality of parts divided into parts to search the document data through the built search database, and the searcher searched the expected search text entered in the search database Document in the search database using One to allow the search to be fast and efficient search, while easy-to-use document data retrieval relates to databasing and document data retrieval method using the same.

현재 컴퓨터를 사용함에 있어서 널리 이용되는 자료검색으로는 먼저 인터넷 자료 검색이 있을 수 있는데 인터넷 자료 검색이란 인터넷을 사용하는 자가 인터넷상에 산재해 있는 웹사이트 중 필요한 문서자료를 포함하고 있는 웹사이트틀 찾고자 하는 것으로서, 현재는 검색프로그램에서 웹사이트에 대한 정보를 조작하여 관리하는 방법에 관한 검색엔진들을 사용하는데, 현재 상용화된 검색엔진들은 인터넷상의 많은 웹사이트를 대상으로 검색하면서 원하는 주제어(키워드)를 포함하는 웹사이트의 주소(URL)를 그 주제어(키워드)로 색인화하여 데이터베이스에 저장하여 인터넷을 사용하는 자에게 서비스하는 방식으로 사용되고 있다. 즉, 종래 인터넷 사용자가 인터넷상에서 자료를 검색하고자 할 때에는 인터넷상에서 인터넷 정보 검색엔진을 제공하는 검색엔진 사이트의 URL(Uniform Resource Location)을 입력하여 접속하고, 접속한 검색엔진 사이트의 검색화면에서 찾고자 하는 정보에 대한 검색식을 입력하고, 입력되는 검색식과의 관계를 이용하여 인터넷상에 산재해 있는 자료 중에서 검색식과 관련된 정보에 접근해 나가면서 검색식에 맞는 정보를 인터넷 사용자에게 제공해 주는 방식을 사용하였다. 종래 이용되고 있는 인터넷 자료 검색엔진의 방식은 디렉토리 방식, 로봇 방식, 메타 방식 등이 있다. 디렉토리 방식이란 사용자나 운영자가 인터넷 정보를 직접 손으로 입력하고, 그리고 그것을 자체의 분류 방식에 따라 디렉토리로 나누어 놓은 것으로서 디렉토리 방식에 의하여 등록이 되면 웹사이트의 주소와 그 웹사이트의 대표적인 내용 설명을 데이터로 가지고 있게 되는 것으로서 이는 웹사이트의 문서 내용 전체가 등록되는 것이 아니라 웹사이트의 대표 주소와 그 대표적인 내용이 등록되는 것이다. 로봇 방식이란 사람이 아닌 로봇이라는 일종의 프로그램이 인터넷을 여행하면서 자동으로 웹사이트를 등록하게 되고 따라서 사용자나 운영자의 입력이 불필요하며 자동으로 일정 기간마다 갱신되고, 이는 웹사이트의 초기화면 주소만 등록되는 디렉토리 방식과는 달리 모든 문서의 내용이 저장되고, 이로 인하여 데이터의 양이 수천만에 달하게 된다. 메타방식이란 자체의 데이터나 검색엔진을 가지고 있지 않고, 검색을 하고자 하는 자가 검색식을 입력하면 다른 검색엔진들에 그 검색식을 보내어 검색하여 검색엔진들로부터 다시 결과를 받아 정리하여 일정하게 정렬하여 보여주는 방식이다.The most widely used data search in the current computer use may be Internet data search. Internet data search is to find a website that contains necessary document data among websites scattered on the Internet. Currently, search engines use search engines on how to manipulate and manage information about websites in search programs. Currently commercialized search engines search for many websites on the Internet and include desired keywords (keywords). It is used by indexing the web site's address (URL) into its main word (keyword) and storing it in a database to serve those who use the Internet. In other words, when a conventional Internet user wants to search for data on the Internet, the user inputs a URL (Uniform Resource Location) of a search engine site that provides an Internet information search engine on the Internet, and then searches for the search screen of the connected search engine site. By entering a search expression for information and accessing the information related to the search expression among the data scattered on the Internet by using the relationship with the search expression entered, the user provided the information suitable for the search expression to the Internet users. . Conventionally used Internet data search engines include a directory method, a robot method, and a meta method. The directory method is a method in which a user or an operator enters Internet information by hand and divides it into directories according to their own classification method. When a directory method is registered, the address of a website and a description of representative contents of the website are displayed. This is because the entire document contents of the website are not registered, but the representative address of the website and its representative contents are registered. The robot method is a robot, not a human, that registers a website automatically as it travels the Internet. Therefore, no input of a user or operator is necessary and is automatically updated at regular intervals. Unlike the directory method, the contents of all documents are stored, resulting in tens of millions of data. The meta method does not have its own data or search engine, and if a person who wants to search enters a search expression, the search expression is sent to other search engines and searched to receive the results from the search engines, sorted, and regularly sorted. This is how it is shown.

상기한 인터넷 자료 검색엔진에 있어서, 디렉토리 방식은 분야별로 검색이 가능하고, 검색된 데이터가 찾고자 하는 정보와 일치할 확률이 높고, 특정한 정보를 담고 있는 문서가 아닌, 어떤 주제에 관한 웹사이트를 검색하는데는 유용하지만, 세밀한 검색이 불가능하다는 단점이 있다. 또한, 로봇 방식은 웹사이트를 문서 단위로 검색하므로 특정한 단어에 관해서도 검색이 가능하다는 장점이 있지만, 데이터의 양이 많아 검색하고자 하는 내용과 거리가 먼 불필요한 내용이 지나치게 많이 검색된다는 단점이 있으며, 정확한 검색을 위해서는 검색식에 대한 충분한 사전 파악이 필요하기 때문에 초보자들이 사용하기에는 어렵다는 단점이 있다. 그리고, 메타방식은 동시에 여러 개의 검색엔진을 사용하는 효과가 있고, 검색 결과도 뛰어난 편이지만 시간이 오래 걸린다는 단점이 있어 현재는 거의 사용되지 않고 있는 방식이다. 또한 종래의 인터넷 자료 검색엔진에서는 검색자가 검색 목적을 나타내는 표현을 직접 검색문으로서 사용을 할 수가 없으며, 검색하고자 하는 웹사이트의 내용을 대표할 만한 내용을 검색어 또는 검색식 형태로 만들어 검색을 수행하여야 하기 때문에, 검색을 수행하는 과정에서 불편함을 야기하며, 검색자가 검색 대상 자료의 내용을 자세히 알고 있지 못한 상황에서는 검색하고자 하는 웹사이트의 내용을 대표할 만한 내용의 검색어를 찾는다는게 용이하지 않으며, 특히 인터넷 자료 검색에 익숙치 않은 초보자에게는 검색을 하는 과정이 상당히 용이하지 않다는 단점이 있는 것이다. 또한, 인터넷 자료 검색시에 검색하고자 하는 웹사이트의 내용을 대표할 만한 내용을 검색어 또는 검색식 형태로 만들어 검색을 수행하기 때문에 검색어 또는 검색식과 비슷한 내용이 검색 대상에 있으면 그것이 원래 검색자가 의도하던 목적과 다를지라도 무조건 검색이 되기 때문에 검색된 자료의 양이 지나치게 많음으로서 검색결과 중 재검색이 필요하거나, 검색된 결과를 제대로 사용하지 못한다는 단점이 있는 것이다.In the above-mentioned internet data search engine, the directory method is searchable by field, and the searched data is likely to match the information to be searched for, and it searches a website about a topic, not a document containing specific information. Is useful, but has the disadvantage of not being able to search in detail. In addition, the robot method searches web sites by document, which makes it possible to search for a specific word. However, the robot method has a disadvantage in that an excessive amount of unnecessary data that is far from the content to be searched is searched for. There is a disadvantage that it is difficult for beginners to use because the search requires a sufficient prior knowledge of the search expression. In addition, the meta-method has the effect of using multiple search engines at the same time, and the search results are also excellent, but it takes a long time, so it is currently rarely used. In addition, in the conventional Internet data search engine, the searcher cannot directly use the expression indicating the purpose of the search as a search sentence, and the search should be performed by creating a search term or a search expression on the contents that can represent the contents of the website to be searched. This can cause inconvenience in the process of performing a search, and it is not easy to find a search term that is representative of the contents of a website to be searched if the searcher does not know the contents of the searched material in detail. In particular, it is a disadvantage that the searching process is not very easy, especially for beginners who are not used to internet searching. In addition, when searching the Internet materials, the search is performed by creating contents that are representative of the contents of the website to be searched in the form of search terms or search expressions. Although it is different from the search results, there is a disadvantage that it is necessary to re-search among the search results or the search results are not used properly because the searched data is too large.

마찬가지로 인터넷의 웹(Web)기술을 이용하여 기업 및 특정 단체의 내부 정보시스템을 구축하는 인트라넷(Intranet)을 통하여 공유되고 있는 문서자료를 검색하는데 있어서도 상기한 인터넷 자료 검색 엔진과 같은 방식으로 구동이 되기 때문에 상기한 바와 같은 문제점을 가지고 있다는 단점이 있는 것이다.In the same way, the Internet data search engine can be used to search document data shared through an Intranet that builds an internal information system of a company and a specific organization using the Web technology of the Internet. Therefore, there is a disadvantage in that it has the same problem as described above.

본 발명은 상기한 문제점을 해결하기 위한 것으로, 컴퓨터를 이용하여 문서자료를 검색하는 자가 검색시에 이용할 만한 검색문을 미리 예상해 검색문을 추출하여 의미에 따라 부분으로 나누어 복수개의 부분으로 구성되는 문장형태로 예상검색문을 추출하고, 추출된 예상검색문을 이용하여 문서자료를 데이터베이스화하여 검색데이터베이스를 구축하고, 검색자는 검색하고자 하는 의도를 직접적으로 나타낼 수 있는 문장을 의미에 따라 부분으로 나누어 복수개의 부분으로 구성되는 문장형태로 검색문을 입력하여 구축된 검색데이터베이스를 통하여 문서자료를 검색하도록 하며, 또한 검색자가 검색데이터베이스에 입력되어 있는 예상검색문을 검색하여 검색된 예상검색문을 이용하여 검색데이터베이스의 문서자료를 검색할 수 있도록 하여 사용이 용이하면서 신속하고 효율적인 검색이 되도록 한 문서자료 검색 데이터베이스화 및 이를 이용한 문서자료 검색 방법에 관한 것이다.The present invention has been made to solve the above problems, a sentence that is composed of a plurality of parts divided into parts according to the meaning by extracting the search sentence in anticipation of a search sentence which can be used by a person searching for document data using a computer in advance. Extract the expected search sentence in the form, construct the search database by documenting the document data using the extracted predicted search sentence, and the searcher divides the sentence that can express the intention to search directly into parts according to the meaning. Enter the search sentence in sentence form consisting of two parts to search the document data through the built-up search database. Also, the searcher searches the expected search text entered in the search database and uses the searched expected search text. Allows you to search documentation in Document data retrieval database for easy, fast and efficient retrieval and document data retrieval method using the same.

도 1은 본 발명의 데이터의 흐름을 도시하는 도면.1 is a diagram showing a flow of data of the present invention.

도 2는 문서자료 검색데이터베이스화 과정을 도시한 절차흐름도.2 is a flow chart illustrating a document data search database process.

도 3은 검색자가 검색데이터베이스를 통하여 문서자료를 검색하는 과정을 도시한 절차흐름도.3 is a flowchart illustrating a process of a searcher searching for document data through a search database.

도 4는 검색자가 검색데이터베이스를 통하여 예상검색문을 검색하는 과정을 도시한 절차흐름도.4 is a flowchart illustrating a process of a searcher searching for an expected search sentence through a search database.

도 5는 유사율 계산 단계가 진행되는 과정을 도시한 일 실시예의 절차흐름도.Figure 5 is a flow chart of an embodiment showing the process of the similarity rate calculation step.

도 6은 검색데이터베이스를 구축하기 위하여 문서자료 기본 정보 및 예상검색문을 입력하기 위한 화면의 일 실시예를 도시한 도면.FIG. 6 is a diagram illustrating an embodiment of a screen for inputting basic document information and an expected search statement to construct a search database. FIG.

도 7은 문서자료 기본 정보 및 예상 검색문이 검색데이터베이스에 입력된 결과를 확인할 수 있도록 입력결과를 표시하여 주는 화면의 일 실시예를 도시한 도면.FIG. 7 is a diagram illustrating an embodiment of a screen displaying input results so that basic information of a document and an expected search statement are inputted into a search database.

도 8은 검색자가 검색문을 입력하기 위한 검색문입력부가 구비되어 있는 화면의 일 실시예를 도시한 도면.8 is a diagram illustrating an embodiment of a screen provided with a search sentence input unit for a searcher to input a search sentence.

도 9는 검색자가 문서자료 검색을 한 후 출력되는 검색결과를 표시하여 주는 화면의 일 실시예를 도시한 도면.FIG. 9 is a diagram illustrating an embodiment of a screen for displaying a search result output after a searcher searches for document data. FIG.

도 10은 검색자가 예상검색문 검색을 한 후 출력되는 검색결과를 표시하여 주는 화면의 일 실시예를 도시한 도면.FIG. 10 is a diagram illustrating an embodiment of a screen for displaying a search result output after a searcher searches for an expected search sentence. FIG.

도 11은 검색데이터베이스에 데이터를 입력하였을 때 입력 결과가 출력되는 화면의 일 실시예를 도시한 도면.FIG. 11 is a diagram illustrating an embodiment of a screen on which an input result is output when data is input to a search database. FIG.

* 도면의 주요부분에 대한 부호의 설명 *Explanation of symbols on the main parts of the drawings

10: 문서주소 입력부 11: 문서이름입력부10: Document address input section 11: Document name input section

12: 문서설명 입력부 13: 예상 검색문 입력부12: Document description input unit 13: expected search input unit

21: 고유번호 필드 22, 23, 24: 예상검색문 필드21: unique number field 22, 23, 24: expected search field

25: 문서주소 필드 26: 문서이름 필드25: Document Address Field 26: Document Name Field

27: 문서설명 필드 28: 유사율 필드27: Document description field 28: Similarity field

31: 검색문 입력부 32: 예상검색문 검색 버튼31: Search sentence input unit 32: Search button for expected search

33: 검색 버튼 34: 예상검색문 표시부33: search button 34: prediction search display unit

상기한 본 발명의 목적을 달성하기 위하여 본 발명에 의한 문서자료 검색을 위한 문서자료 검색데이터베이스화 방법은In order to achieve the above object of the present invention, a document data search database for document data search according to the present invention is

문서자료에 대한 기본 정보를 입력하는 문서자료 기본 정보 입력 단계와;A document data basic information input step of inputting basic information about document data;

상기 문서자료 기본 정보 입력단계에서 입력된 해당 문서자료에 예상되는 검색문을 추출하여 의미에 따라 부분으로 나누어 복수개의 부분으로 구성되는 문장형태로 예상검색문을 입력하는 예상검색문 입력 단계와;An expected search sentence input step of inputting an expected search sentence in a sentence form composed of a plurality of parts by extracting a search sentence expected in the corresponding document data input in the document data basic information input step;

상기 문서자료 기본 정보 입력 단계와 예상검색문 입력 단계에서 입력된 데이터를 내부 데이터형식으로 변환하는 데이터 변환 단계와;A data conversion step of converting the data input in the document data basic information input step and the prediction search statement input step into an internal data format;

상기 데이터 변환 단계 후 검색데이터베이스를 여는 검색 데이터베이스 열기 단계와;Opening a search database to open a search database after the data conversion step;

상기 데이터 변환 단계에서 변환된 데이터를 검색데이터베이스의 각각의 필드(field)에 맞게 레코드(record)형태로 검색데이터베이스에 입력하는 검색데이터베이스 입력 단계와;A search database input step of inputting the data converted in the data conversion step into a search database in a record form in accordance with each field of the search database;

검색데이터베이스를 닫는 검색 데이터베이스 닫기 단계를 포함하는 것을 특징으로 한다.And closing the search database.

문서자료 기본 정보 입력 단계는 문서자료 검색을 위한 검색데이터베이스를 구축하기 위하여 필요한 기본 데이터를 입력하는 단계로서, 문서자료에 대한 기본 정보를 획득하여 문서자료에 대한 기본 정보를 입력하는 단계이다. 문서자료에 대한 기본 정보에는 문서자료의 이름과 문서자료에 대한 간단한 설명이 있으며 문서자료가 있는 주소에 대한 정보가 있다. 문서자료가 있는 주소란 인터넷에 구축된 문서자료의 경우에는 인터넷사이트 주소(URL, Uniform Resource Location)를 말하는 것이며, 기업내의 인트라넷에 구축된 문서자료의 경우에는 해당 문서자료가 있는 경로를 나타내는 주소를 말하는 것이다. 상기에서 제시한 기본 정보 이외에도 검색 데이터베이스 구축에 필요하다고 생각되는 정보를 추가로 입력할 수 있다.Document data basic information input step is to input basic data necessary to build a search database for document data search, and to acquire basic information about document data and input basic information about document data. Basic information about documentation includes the name of the documentation, a brief description of the documentation, and information about the address where the documentation is located. In the case of document data constructed on the Internet, the document address refers to the Internet site address (URL, Uniform Resource Location). In the case of document data constructed on the intranet of an enterprise, the address indicating the path of the document data is indicated. I speak. In addition to the basic information presented above, information deemed necessary for constructing a search database can be additionally input.

예상검색문 입력 단계는 검색자가 검색시에 검색문으로 의도할 것으로 예상되는 예상검색문을 임의로 추출하여 입력하는 것이다. 예상검색문을 입력하는데 있어서 예상검색문을 추출하는 것은 종래의 검색엔진에서 이용하는 문서자료의 내용을 대표할 수 있는 단어 또는 단어의 조합 형태의 키워드를 검색어로 하는 방식이 아니라, 검색자가 실제로 문서자료 검색시에 의도할 것이라고 예상되는 의미를 적절히 표현하는 단어로 조합하여 문장의 의미에 따라 부분으로 나누어 복수개의 부분으로 구성되는 문장형태로 검색문을 만들어 예상검색문으로 추출하는 것이다. 즉, 예상검색문을 추출하는 것은 종래의 문서 중심의 문서의 내용을 대표하는 키워드 형태의 검색어를 추출하는 것이 아니라, 검색자 중심의 검색자의 의도를 나타내는 형태로 추출하도록 하여 의미에 따라서 복수개의 부분으로 나누어 입력하도록 하는 것이다. 의미에 따라 부분으로 나누어 복수개의 부분으로 구성되는 문장형태로 예상검색문을 추출하는 한가지 예를 들면, '언제, 어디서, 누가'에 해당하는 제 1부분과, '무엇을'에 해당하는 제 2부분과, '어떻게 하고'에 해당하는 제 3부분으로 나누어 구성되는 문장 형태로 예상검색문을 추출하여 입력하도록 한다. 이때, 검색자가 검색을 한다는 것은 어떠한 원하는 바를 얻고 싶어하는 것이므로 제 3부분의 어미에는 '싶다'라는 말을 붙여서 검색문이 매끄러운 완전한 문장의 형태가 되도록 하는 것이 바람직하다. 여기서, 구성되는 예상검색문을 나누는 것에 대하여 상기한 바와 같이 제 1부분과 제 2부분과 제 3부분의 세 가지 부분으로 나누는 것은 한가지 실시예로서 검색데이터베이스의 구축 조건에 따라서 가감할 수 있다. 또한, 반드시 예상검색문 입력부의 모든 부분에 검색어를 입력할 필요는 없으며 검색자가 의도하리라 예상되는 조건에 따라서 필요한 부분에만 부분적으로 검색어를 입력하여 예상검색문을 형성할 수 있다. 물론, 예상검색문을 의미에 따라 부분으로 나누는 구성방법은 상기에서 설명한 기술적 사상을 이용하여 여러 방법으로 응용이 가능함을 밝혀둔다. 문장 형태로 검색문을 추출한다는 것에 대하여 한가지 예를 들면 문서자료에 대한 내용이 '등산'에 대한 내용인 경우에 종래의 검색엔진은 키워드로서 '등산' 또는 '산' 등을 이용하였으나, 본 발명에서는 등산에 대한 문서자료를 검색하고자 하는 검색자가 의도하리라 예상되는 바를 충분히 반영하면서 직접 문서자료의 내용과는 관련이 없더라도 검색자가 검색시 사용하리라 예상되는 '기분전환을 하고 싶다' 또는 '건강을 증진시키고 싶다' 또는 '여가생활을 하고 싶다' 또는 '등산을 하고 싶다' 또는 '산에 가고 싶다' 등의 문장형태의 검색문을 예상검색문으로서 추출하여 구성한다는 것이다. 상기한 바와 같이 추출된 문장형태의 예상검색문을 의미에 따라서 상기에서 설명한 제 1부분과 제 2부분 및 제 3부분으로 나누어 해당되는 부분에 입력하도록 하고, 이와 같은 방법으로 예상검색문을 미리 추출하여 검색데이터베이스 구축시에 검색데이터베이스에 입력하면 검색자는 검색시에 별도의 키워드를 생각할 필요가 없이 검색문입력부에 검색자가 현재 의도하는 바를 직접 문장형태로 검색문을 입력하여 검색을 하게 됨으로서 검색이 간편하다는 이점이 있으며, 특히 문서자료 검색에 경험이 적은 초보자도 편리하게 문서자료 검색을 할 수 있다는 이점이 있는 것이다. 또한 검색문을 문장의 의미에 따라 몇 개의 부분으로 나누어 입력함으로서 하기에서 설명하는 예상검색문검색에 의하여 검색되는 예상검색문의 각 부분들을 조합하여 검색문을 만들어 검색을 간편하게 할 수 있다는 이점이 있게 된다.Predictive search input step is a searcher inputs a randomly selected prediction search that is expected to be a search at the time of search. Extracting the expected search sentence in inputting the expected search sentence is not a way of using a keyword or a combination of words that can represent the contents of the document data used in a conventional search engine. It combines the meanings that are expected to be intended at the time of the search into appropriate words, divides them into parts according to the meaning of the sentences, and creates a search sentence in sentence form consisting of a plurality of parts and extracts it as an expected search sentence. That is, the extraction of the expected search sentence does not extract a keyword in the form of a keyword representing the content of a conventional document-oriented document, but in a form representing the intention of the searcher-centered searcher. It is to be divided into. For example, the first part corresponding to 'when, where, and who' and the second part corresponding to 'what,' are extracted. Part of the sentence and a third part corresponding to 'how to do' to extract and enter the expected search sentence. In this case, since the searcher wants to obtain something desired, it is preferable to attach the word 'I want' to the ending of the third part so that the search sentence is in the form of a smooth sentence. Here, as described above, dividing the predicted search sentence into three parts, the first part, the second part, and the third part, may be added or subtracted according to the construction condition of the search database. In addition, it is not necessary to input a search word in all parts of the prediction search input unit, and the search word may be formed by partially inputting the search word only in necessary parts according to a condition that the searcher intends. Of course, the method of dividing the expected search into parts according to the meanings can be applied in various ways using the above-described technical idea. For example, in the case of extracting a search sentence in a sentence form, for example, when the content of the document data is about 'climbing', a conventional search engine uses 'climbing' or 'mountain' as a keyword. Fully reflects what a searcher wants to search for document data on mountain climbing, and wants to “convert mood” or “enhance health,” which searchers are expected to use when searching, even if they are not directly related to the content of document data. It is to extract and construct a sentence in the form of a sentence such as 'I want to make a living', 'I want to have a leisure life', 'I want to go hiking' or 'I want to go to a mountain'. According to the meaning, the predicted search sentence extracted as described above is divided into the first part, the second part, and the third part described above according to the meaning, and inputted in the corresponding part. When the search database is entered into the search database, the searcher does not have to think of a separate keyword at the time of the search. Instead, the searcher inputs the search sentence directly in the sentence form in the search sentence input section to search. In particular, even beginners who have little experience in searching for documents can easily search for documents. In addition, by inputting the search sentence into several parts according to the meaning of the sentence, there is an advantage that the search can be made easily by combining the respective parts of the predicted search text searched by the predicted search text described below to simplify the search. .

데이터 변환 단계는 상기 데이터 입력 단계에서 입력된 데이터를 검색데이터베이스에 입력하거나 또는 검색데이터베이스로부터 출력하거나 또는 데이터를 이용하여 유사율을 구할 경우에 이상이 생기지 않고 정상적으로 작동할 수 있도록 검색데이터베이스 내부 데이터 형식으로 통일을 시켜주는 단계이다. 예를 들면, 문서자료의 기본 정보 및 예상검색문을 입력하는 과정에서의 입력 문자열의 좌, 우의 스페이스(space)를 제거하거나 또는 문자열 중의 연속 스페이스는 단일 스페이스로 만들어 주거나 또는 입력되는 문서 주소가 인터넷 사이트 주소(URL)인 경우에는 인터넷 사이트 주소(URL)의 마지막 부분에 있는 '/'를 없애주어 인터넷 사이트 주소(URL)의 형식을 통일시켜 주는 등의 입력데이터를 일정한 형식에 맞도록 변환 시켜 주는 것을 말한다.In the data conversion step, the data entered in the data input step is inputted into the search database, outputted from the search database, or when the similarity rate is obtained using the data. It is the stage of unification. For example, removing the left and right spaces of the input string in the process of inputting basic information of the document and the expected search sentence, or making the consecutive spaces in the string into a single space, or inputting document address on the internet. In the case of a site address (URL), input data, such as unifying the format of an Internet site address (URL) by removing the '/' at the end of the Internet site address (URL), is converted to fit a certain format. Say that.

검색데이터베이스 열기 단계는 상기 데이터 입력 단계에서 입력되어 상기 데이터 변환 단계를 거쳐서 변환된 데이터를 검색데이터베이스에 구축하기 위하여 검색데이터베이스를 여는 단계이다.The opening of the search database is a step of opening a search database in order to build the data entered in the data input step and converted through the data conversion step in the search database.

검색데이터베이스 입력 단계는 상기 데이터 입력 단계에서 입력되어 상기 데이터 변환 단계를 거쳐서 변환된 데이터를 검색데이터베이스에 입력하여 검색데이터베이스를 구축하는 단계이다. 상기 변환된 데이터는 고유번호, 예상검색문, 문서주소, 문서이름, 문서설명, 유사율을 각각 하나의 필드(field)로 하여 각각 해당되는 필드에 입력되어 하나의 레코드(record)를 형성하여 레코드 형태로 검색데이터베이스에 입력되는 것이다. 이때, 예상검색문은 상기에서 설명한 바와 같이 제 1부분, 제 2부분, 제 3부분 등 복수개의 부분으로 구성됨에 따라서 각기 예상검색문1필드, 예상검색문2필드, 예상검색문3필드 등으로 나누어서 해당하는 부분이 입력이 된다. 유사율이란 검색자가 입력한 검색문과 검색 데이터베이스에 구축되어 있는 예상검색문과의 유사한 정도를 정해진 유사율 계산 방식에 의하여 구해진 값을 말하는 것이다. 문서자료에 대한 기본 정보 데이터와 예상검색문 데이터가 계속해서 입력됨에 따라서 순차적으로 계속하여 검색데이터베이스에서 각각 해당되는 필드에 입력되어 하나의 레코드를 형성해나가면서 레코드의 수가 계속하여 증가하면서 검색데이터베이스가 구축되어 가는 것이다.The search database input step is a step of constructing a search database by inputting the data input in the data input step and converted through the data conversion step into the search database. The converted data is inputted into the corresponding fields, each with a unique number, an expected search statement, a document address, a document name, a document description, and a similarity rate, to form a record. It is entered into the search database in the form. At this time, the predicted search sentence is composed of a plurality of parts such as the first part, the second part, and the third part, as described above, and thus, the predicted search sentence 1 field, the predicted search sentence 2 field, and the predicted search sentence 3 field, respectively. The corresponding part is divided into inputs. The similarity rate is a value obtained by a similarity rate calculation method that determines the degree of similarity between a search word input by a searcher and an expected search word constructed in a search database. As the basic information data and the expected search text data of document data are continuously input, the search database is constructed as the number of records continues to increase as the records are continuously entered into the corresponding fields in the search database. It is going to be.

검색데이터베이스 닫기 단계는 검색 데이터베이스 구축이 끝나면 검색데이터베이스 입력을 마치기 위하여 검색데이터베이스를 닫고 종료하는 단계이다.The closing of the search database is the step of closing and closing the search database in order to finish entering the search database when the search database is completed.

상기 검색데이터베이스 닫기 단계 다음에는 입력 내용이 정확하게 입력되어 검색데이터베이스가 구축되었는지를 확인할 수 있도록 하기 위하여 입력 결과를 출력하여 주는 입력 결과 출력 단계를 더 포함하도록 할 수도 있다.The closing of the search database may further include an input result output step of outputting an input result in order to confirm whether an input content is correctly inputted so that the search database is constructed.

본 발명 문서자료 검색을 위한 문서자료 검색데이터베이스화 방법에 의하여 데이터베이스화된 검색데이터베이스를 이용하여 검색자가 문서자료를 검색하는 방법은In the present invention, a searcher searches for document data using a search database databased by a document data search database method for searching document data.

검색자가 검색문입력부에 의미에 따라 부분으로 나누어 복수개의 부분으로 구성되는 문장형태로 검색문을 입력하는 검색문 입력 단계와;A search sentence input step of the searcher inputting a search sentence in a sentence form composed of a plurality of parts divided into parts according to the meaning of the search sentence input unit;

상기 검색문 입력 단계에서 입력된 검색문 데이터를 내부 데이터형식으로 변환하는 데이터 변환 단계와;A data conversion step of converting the search statement data input in the search statement input step into an internal data format;

검색데이터베이스를 여는 검색 데이터베이스 열기 단계와;Opening a search database to open a search database;

검색데이터베이스에 저장된 각 레코드를 읽어 각 레코드의 예상검색문필드의 예상검색문 내용과 검색자가 입력한 검색문 사이의 유사율을 계산하고, 계산된 유사율을 각 레코드의 유사율필드에 입력하여 레코드를 업데이트 하는 유사율 계산 단계와;Reads each record stored in the search database, calculates the similarity rate between the predicted search statement contents of each record and the search statement entered by the searcher, and enters the calculated similarity rate into the similarity field of each record. A similarity calculation step of updating a;

상기 유사율 계산 단계에서 계산된 유사율이 입력된 각 레코드에서 유사율필드의 유사율이 0보다 큰 레코드만을 선택하여 레코드세트를 구성하는 레코드세트 구성 단계와;A record set construction step of constructing a recordset by selecting only records in which the similarity rate of the similarity rate field is greater than 0 in each record in which the similarity rate calculated in the similarity rate calculation step is input;

상기 레코드세트 구성 단계에서 구성된 레코드세트의 각 레코드의 내용을 화면에 출력하도록 하는 레코드세트 출력단계와;A recordset outputting step of outputting the contents of each record of the recordset configured in the recordset configuring step to a screen;

검색데이터베이스를 닫는 검색데이터베이스 닫기 단계를 포함하는 것을 특징으로 한다.And closing the search database.

검색문 입력 단계는 문서자료를 검색하고자 하는 검색자가 검색하고자 하는 문서자료를 찾기 위한 검색 조건을 입력하는 것으로서, 검색자는 검색문입력부에 자신이 검색하고자 의도하는 바를 나타내는 문장을 의미에 따라 부분으로 나누어 복수개의 부분으로 구성되는 문장형태로 검색문을 입력하도록 한다. 만약 검색문입력부가 '언제, 어디서, 누가'에 해당하는 제 1부분과, '무엇을'에 해당하는 제 2부분과, '어떻게 하고'에 해당하는 제 3부분으로 3개의 부분으로 나누어 있는 경우에는 검색자가 의도하는 바를 해당하는 부분에 입력을 하도록 한다. 이때, 구성되는 검색문 입력부는 제 1부분과 제 2부분과 제 3부분으로 반드시 한정하는 것은 아니면 검색데이터베이스의 구축 조건에 따라서 가감할 수 있는데, 검색데이터베이스에 구축되어 있는 예상검색문과 같은 수의 부분으로 나누어 검색시에 같은 부분끼리 비교하여 검색의 효율성을 높이도록 한다. 또한, 검색자는 반드시 검색문 입력부의 모든 부분에 검색어를 입력할 필요는 없으며 검색자가 의도하는 조건에 따라서 필요한 부분에만 부분적으로 검색어를 입력할 수 있다.In the search sentence input step, the searcher who wants to search the document data inputs a search condition to find the document data that the user wants to search. The searcher divides the sentence that indicates what he intends to search in the search sentence input section into parts according to the meaning. Enter a search sentence in sentence form consisting of a plurality of parts. If the search sentence input unit is divided into three parts, a first part corresponding to 'when, where, and who', a second part corresponding to 'what', and a third part corresponding to 'how and' In this section, the searcher inputs what he wants. In this case, the search statement input unit may be added or subtracted according to the construction conditions of the search database, although it is not limited to the first part, the second part, and the third part. In order to improve the efficiency of the search by comparing the same parts when searching. In addition, the searcher may not necessarily input a search word in all parts of the search sentence input unit, but may partially input the search word only in the required part according to a condition intended by the searcher.

데이터 변환 단계는 상기 검색문 입력 단계에서 입력된 검색문 데이터를 이용하여 유사율을 구할 경우에 이상이 생기지 않도록 검색데이터베이스 내부 데이터 형식으로 통일을 시켜주는 단계이다.The data conversion step is a step of unifying the internal data format of the search database so that no abnormality occurs when a similarity rate is obtained using the search statement data input in the search statement input step.

검색데이터베이스 열기 단계는 검색을 진행하기 위하여 검색데이터베이스를 여는 역할을 하는 단계이다.The opening of the search database is a step of opening a search database in order to proceed with a search.

유사율 계산 단계는 검색데이터베이스에 이미 구축되어 고유번호, 예상검색문, 문서주소, 문서이름, 문서설명을 필드로 하여 각 고유번호필드, 예상검색문필드, 문서주소필드, 문서이름필드, 문서설명필드에 해당 내용이 입력되어 있는 레코드를 문서자료 검색에 이용하기 위하여 하나씩 읽어 들이고, 읽어 들인 각 레코드의 예상검색문필드의 예상검색문을 검색자가 입력한 검색문과의 유사율을 정해진 유사율 계산방식에 의하여 계산하고, 계산된 각 유사율을 각 레코드의 유사율 필드에 입력하여 각 레코드를 업데이트하는 단계이다. 이때, 유사율을 계산하는 방식은 여러 가지 방법으로 구성할 수가 있다. 유사율을 계산하는 방식에 대한 실시예는 하기에서 설명하기로 한다. 이때, 유사율 계산 단계는 검색데이터베이스에 구축되어 있는 전체 레코드를 하나씩 단계별로 읽어들여서 전 레코드에 대한 예상검색문필드의 예상검색문과 검색자가 입력한 검색문과의 유사율을 계산하는 과정을 말하고, 계산된 유사율을 검색데이터베이스의 각 레코드의 유사율필드에 입력하여 각 레코드를 업데이트시키는 것이다. 즉, 하나의 레코드를 읽어서 유사율을 계산하여 레코드를 업데이트시키고, 다시 다음 레코드를 읽어 유사율을 계산하여 레코드를 업데이트시키는 식으로 반복하여 전체 레코드에 대하여 레코드 읽기와 유사율 계산 및 레코드 업데이트를 수행한다. 전체레코드에 대한 읽기와 유사율 계산을 하여 레코드 업데이트가 마쳐지면 다음 단계인 레코드 세트 구성 단계를 진행하게 되는 것이다.The similarity calculation step is already established in the search database, and each unique number field, expected search field, document address field, document name field, and document description are provided with the unique number, the expected search statement, the document address, the document name, and the document description as fields. Similarity calculation method that reads the records with the corresponding contents in the field one by one for use in document data search, and sets the similarity rate with the search sentence entered by the searcher in the expected search sentence field of each read record. And updating each record by inputting the calculated similarity rate into the similarity rate field of each record. At this time, the method of calculating the similarity rate can be configured in various ways. An example of a method of calculating the similarity rate will be described below. In this case, the similarity calculation step refers to a process of calculating the similarity rate between the predicted search sentence in the predicted search sentence field for all records and the search input entered by the searcher by reading all the records constructed in the search database step by step. The updated similarity rate is entered in the similarity rate field of each record of the search database to update each record. In other words, it reads one record to calculate the similarity rate and updates the record, and then reads the next record to calculate the similarity rate and updates the record to repeat the record reading, similarity calculation and record update for all records. do. When the record update is completed by reading the entire record and calculating the similarity rate, the next step is to construct the record set.

도 5는 유사율 계산 단계가 진행되는 과정을 도시한 일 실시예의 절차흐름도이다. 도 5에 도시한 바와 같이 검색DB에서 레코드를 읽는 위치를 레코드의 첫줄로 이동하는 제 1 단계와, 상기 제 1 단계에서 이동한 레코드의 위치가 마지막 줄인가를 판단하는 제 2 단계와, 상기 제 2 단계에서 마지막 줄이 아닌 것으로 판단된 경우에는 레코드를 읽는 제 3 단계와, 상기 제 3 단계에서 읽은 레코드의 예상검색문과 검색자가 입력한 검색문과의 유사율을 계산하는 제 4 단계와, 상기 제 4 단계에서 계산된 유사율을 현재 레코드의 유사율 필드에 입력하여 레코드를 업데이트하는 제 5 단계와, 상기 제 5 단계에서 업데이트를 마치면 다음 줄의 레코드 위치로 이동하는 6 단계를 포함하여 유사율 계산 단계가 진행되는 것이다. 상기 제 2 단계에서 레코드의 위치가 마지막 줄로 판단되는 경우에는 바로 레코드 읽기를 마치고 레코드세트 구성 단계를 진행한다.5 is a flowchart illustrating an embodiment of a process in which a similarity rate calculation step is performed. As shown in FIG. 5, a first step of moving a position where a record is read from a search DB to a first line of the record, a second step of determining whether the position of the record moved in the first step is a last row, and the first step If it is determined in step 2 that it is not the last line, a third step of reading a record, a fourth step of calculating a similarity rate between the predicted search statement of the record read in the third step and the search statement input by the searcher; A similarity calculation including a fifth step of updating a record by inputting the similarity rate calculated in step 4 into a similarity rate field of the current record, and a sixth step of moving to a record position of the next line when updating is completed in the fifth step. The steps are in progress. If the position of the record is determined to be the last line in the second step, the reading of the record is completed and the record set construction step is performed.

레코드세트 구성 단계는 업데이트가 된 각 레코드 중에서 유사율이 0보다 큰 레코드만을 선택하여 레코드세트로 구성하는 단계이다. 계산된 유사율이 0이하인 경우에는 검색자가 입력한 검색문과 유사하지 않은 예상검색문이므로 해당 레코드의 문서자료는 검색자가 원하는 문서자료가 아니므로 검색자에게 해당 레코드를 제공할 필요가 없으며, 유사율이 0보다 큰 경우만이 검색자에게 유용하므로 유사율이 0보다 큰 레코드만을 선택하여 레코드세트를 구성한다.In the recordset configuration step, only records having a similarity rate greater than zero among each updated record are selected and configured into a recordset. If the calculated similarity rate is less than or equal to 0, it is not a similar search sentence entered by the searcher, so the document data of the record is not the document data that the searcher wants, so it is not necessary to provide the record to the searcher. Only cases with greater than zero are useful to the searcher, so select only records with similarity greater than zero to construct the recordset.

레코드세트 출력단계는 상기 레코드 세트 구성 단계에서 구성된 레코드세트의 각 레코드 내용을 검색자가 볼 수 있도록 화면에 출력하여 주는 단계이다. 검색자는 화면에 출력된 레코드세트를 확인함으로서 자신이 입력한 검색문에 따른 관련된 문서자료의 목록을 확인할 수 있게 되는 것이며, 이때, 레코드세트의 출력을 통하여 검색자에게 제공되는 내용은 검색자가 입력한 검색문, 예상검색문, 유사율, 문서이름, 문서주소, 문서설명 등이 포함된다. 검색자는 자신이 입력한 검색문과 검색데이터베이스에 저장되어 있는 예상검색문과의 유사율을 확인함으로서 자신이 의도하는 검색 목적에 가장 부합하는 문서자료를 찾을 수 있게 되는 것이다. 이때, 구성된 레코드세트를 화면에 출력하는 단계에서는 레코드세트의 각 레코드의 유사율 값의 크기를 비교하여 유사율 값이 큰 레코드에서 작은 레코드 순으로 정렬하여 검색자에게 출력하여 검색자가 유사율 순서에 의하여 문서자료를 확인할 수 있도록 하는 것이 바람직하다.The record set output step is a step of outputting the contents of each record of the record set configured in the record set configuration step so that a searcher can view the screen. The searcher can check the list of relevant document data according to the search statement entered by the user by checking the recordset displayed on the screen, and the contents provided to the searcher through the output of the recordset are inputted by the searcher. Includes search, expected search, similarity rate, document name, document address, and document description. The searcher can find the document data that best matches the intended search purpose by checking the similarity rate between the search text entered by the searcher and the expected search text stored in the search database. At this time, in the step of outputting the configured recordset on the screen, the similarity value of each record in the recordset is compared, and the records having the similarity value are sorted in order of the smallest records and output to the searcher. It is desirable to be able to check the document data.

검색데이터베이스 닫기 단계는 상기 레코드세트 출력단계를 통하여 검색자가 입력한 검색문에 대한 검색과정이 마쳐짐으로서 검색데이터베이스를 닫는 단계이다.The closing of the search database is a step of closing a search database by completing a search process for a search statement input by a searcher through the record set output step.

본 발명에 의한 문서자료 검색을 위한 문서자료 검색데이터베이스화 방법에 의하여 구축된 검색데이터베이스를 이용하여 문서자료를 검색하는데 있어서 검색자가 검색데이터베이스에 입력되어 저장되어 있는 예상검색문을 검색할 수 있도록 하여 미리 입력되어 저장되어 있는 예상검색문을 문서자료 검색에 이용하도록 함으로서 검색자의 검색의 편리성과 효율성을 더욱 증대시킬 수 있다.In searching the document data using a search database constructed by a document data search database method for searching document data according to the present invention, the searcher can search the expected search text input and stored in the search database in advance. It is possible to further increase the convenience and efficiency of the searcher's search by using the predicted search text that is input and stored in the document data search.

상기한 바와 같이 본 발명 문서자료 검색을 위한 문서자료 검색데이터베이스화 방법에 의하여 데이터베이스화된 검색데이터베이스를 이용하여 검색자가 예상검색문을 검색하는 방법은As described above, a searcher searches for an expected search sentence using a search database databased by a document data search database method for searching document data of the present invention.

검색데이터베이스에 저장된 각 레코드를 읽어 각 레코드의 예상검색문필드의 예상검색문 내용과 검색자가 입력한 검색문 사이의 유사율을 계산하고, 계산된 유사율을 각 레코드의 유사율필드에 입력하여 레코드를 업데이트하는 유사율 계산 단계와;Reads each record stored in the search database, calculates the similarity rate between the predicted search statement contents of each record and the search statement entered by the searcher, and enters the calculated similarity rate into the similarity field of each record. A similarity rate calculating step of updating a;

상기 유사율 계산 단계에서 업데이트된 각 레코드의 유사율필드의 유사율이 0보다 큰 레코드만을 선택하여 레코드세트를 구성하는 레코드세트 구성 단계와;A record set construction step of constructing a recordset by selecting only records in which the similarity rate of the similarity rate field of each record updated in the similarity rate calculation step is greater than zero;

상기 레코드세트 구성 단계에서 구성된 레코드세트에서 예상검색문을 화면의 예상검색문표시부에 출력하는 예상검색문 출력 단계와;An expected search statement outputting step of outputting an expected search statement in the predicted search statement display unit of the record set configured in the record set configuration step;

상기 검색자가 예상검색문을 검색하는 방법에 포함되는 각 단계 중 검색문 입력 단계, 데이터 변환 단계, 검색 데이터베이스 열기 단계, 유사율 계산 단계, 레코드세트 구성 단계, 검색데이터베이스 닫기 단계는 상기에서 설명한 검색자가 검색문을 이용하여 문서자료를 검색하는 방법에 포함되는 각 단계와 같은 내용의 역할을 하는 것이다.The searcher input step, data conversion step, search database open step, similarity rate calculation step, recordset configuration step, and search database closing step may include the searcher described above. This is the same function as each step included in the method of searching the document using the search statement.

예상검색문 출력 단계에서 검색된 예상검색문은 예상검색문표시부에 표시되는데 예상검색문은 의미에 따라 부분으로 나누어 복수개의 부분으로 구성되는 문장형태로 출력이 된다. 상기에서 설명한 예와 마찬가지로 예상검색문의 구성방식에 따라서 '언제, 어디서, 누가'에 해당하는 제 1부분과, '무엇을'에 해당하는 제 2부분과, '어떻게 하고'에 해당하는 제 3부분으로 구성되는 문장 형태로 검색자에게 출력될 수 있다. 유사율이 0보다 큰 레코드의 예상검색문이 여러 개일 경우에는 다수의 예상검색문이 예상검색문표시부에 출력된다. 예상검색문표시부에 출력되는 예상검색문은 검색자가 검색시의 타이핑의 수고를 덜어 주기 위하여, 예상검색문표시부의 어느 예상검색문을 클릭하였을 경우에 검색문이 검색문입력부로 이동하도록 링크시키도록 하는 것이 바람직하다.The predicted search text searched in the predicted search text output step is displayed in the predicted search text display unit. The predicted search text is output in a sentence form composed of a plurality of parts divided into parts according to meanings. As in the above-described example, the first part corresponding to 'when, where, and who', the second part corresponding to 'what', and the third part corresponding to 'how' It can be output to the searcher in the form of sentences consisting of. If there are multiple predictions for records with similarity greater than zero, multiple predictions are displayed in the prediction display. The predictive search output displayed in the predictive search display section allows the search text to be linked to the search text input section when the searcher clicks on a predicted search text in the predictive search display section so as to save typing effort in the search. It is desirable to.

이하에서는 첨부된 도면을 참조하여 본 발명의 바람직한 일 실시예에 대하여 상세히 설명하기로 한다.Hereinafter, with reference to the accompanying drawings will be described in detail a preferred embodiment of the present invention.

도 1은 본 발명의 데이터의 흐름을 도시하는 도면이다. 도 1에 도시한 바와 같이 입력프로세스를 통하여 검색데이터베이스의 데이터가 입력이 되어 검색데이터베이스가 구축이 되고, 문서자료를 검색하고자 하는 검색자는 구축된 상기 검색데이터베이스를 이용하여 검색프로세스를 통하여 문서자료를 검색할 수 있으며, 문서자료를 검색하는데 있어서 검색문검색프로세스를 통하여 검색데이터베이스에 저장된 예상검색문을 검색하여 문서자료를 검색하는데 이용할 수 있다.1 is a diagram showing the flow of data of the present invention. As shown in FIG. 1, data of a search database is input through an input process, and a search database is constructed. A searcher who wants to search document data searches for document data through a search process using the established search database. In search of document data, it can be used to search the document data by searching the expected search text stored in the search database through the search text search process.

도 2는 문서자료 검색데이터베이스화 과정을 도시한 절차흐름도이다. 도 2에 도시한 바와 같이 문서자료 검색을 위한 문서자료 검색데이터베이스화를 하고자 하는 자는 먼저 검색데이터베이스화할 문서자료에 대한 기본 정보를 획득하고 해당 문서자료에 대한 예상검색문을 추출한다. 예상검색문을 추출하는 것은 본 발명의 가장 큰 특징으로서 검색자의 입장에서 해당 문서자료를 검색자가 찾고자 할 때 사용할 만한 검색문을 추출하여 문장 형태로 예상검색문을 만들도록 한다. 이때, 추출되어 만들어지는 예상검색문은 문장형태이기 때문에 검색자들이 검색시에 간편하게 이용할 수 있다. 획득한 문서자료의 기본정보와 추출한 예상검색문을 입력한다. 상기 데이터 입력 단계에서 입력된 데이터는 내부 데이터형식으로 변환되고, 다음에 검색데이터베이스를 열어 검색데이터베이스의 각각의 필드(field)에 맞게 레코드(record)형태로 검색데이터베이스에 입력되고, 입력을 마치면 검색 데이터베이스 닫음으로서 종료하게 된다.2 is a flowchart illustrating a document data retrieval database process. As shown in FIG. 2, a person who intends to make a document data search database for document data search first obtains basic information on document data to be searched and extracts an expected search statement for the document data. Extracting the expected search sentence is the biggest feature of the present invention, and extracts a search sentence that can be used when the searcher wants to find the corresponding document data from the position of the searcher to make the expected search sentence in sentence form. At this time, since the extracted search sentence is a sentence form, searchers can easily use the search. Enter the basic information of the acquired document and the extracted prediction. The data input in the data input step is converted into an internal data format, and then the search database is opened and entered into the search database in a record form for each field of the search database. It will end by closing.

도 3은 검색자가 검색데이터베이스를 통하여 문서자료를 검색하는 과정을 도시한 절차흐름도이다. 도 3에 도시한 바와 같이 검색자는 검색문입력부에 의미에 따라 부분으로 나누어 복수개의 부분으로 구성되는 문장형태로 검색문을 입력한다.다음에 검색자가 입력한 검색문 데이터는 내부 데이터형식으로 변환된다. 다음에 문서자료 검색을 위하여 검색데이터베이스를 열고, 검색데이터베이스에 저장된 각 레코드를 읽어 각 레코드에 대하여 각 레코드의 예상검색문필드의 예상검색문 내용과 검색자가 입력한 검색문 사이의 유사율을 계산하고, 계산된 유사율을 각 레코드의 유사율필드에 입력하여 레코드를 업데이트한다. 다음에 상기 유사율 계산 단계에서 업데이트 된 각 레코드에서 유사율필드의 유사율이 0보다 큰 레코드만을 선택하여 레코드세트를 구성하는 레코드세트가 구성된다. 다음에 구성된 레코드세트의 각 레코드의 내용을 검색자가 볼 수 있도록 레코드세트를 화면에 출력하도록 하여 준다. 이때 화면에 출력되는 레코드세트는 각 레코드의 유사율을 비교하여 유사율 값이 큰 레코드 부터 작은 레코드 순으로 정렬하여 출력된다.3 is a flowchart illustrating a process of a searcher searching for document data through a search database. As shown in FIG. 3, the searcher inputs a search sentence in a sentence form composed of a plurality of parts divided into parts according to the meaning of the search sentence input unit. Next, the search statement data input by the searcher is converted into an internal data format. . Next, open the search database to search the document data, read each record stored in the search database, calculate the similarity rate between the predicted search text in the predicted search field of each record and the search text entered by the searcher for each record. Update the record by entering the calculated similarity rate into the similarity field of each record. Next, in each of the records updated in the similarity rate calculation step, only a record having a similarity rate of greater than zero in the similarity rate field is selected to constitute a recordset. The recordset is then displayed on the screen so that the searcher can see the contents of each record in the configured recordset. At this time, the recordset displayed on the screen is compared by the similarity rate of each record and is sorted in order from the record with the largest similarity value to the smallest record.

도 4는 검색자가 검색데이터베이스를 통하여 예상검색문을 검색하는 과정을 도시한 절차흐름도이다. 도 4에 도시한 바와 같이 검색자는 검색문입력부에 의미에 따라 부분으로 나누어 복수개의 부분으로 구성되는 문장형태로 검색문을 입력한다. 검색자가 입력한 검색문 데이터는 내부 데이터형식으로 변환된다. 다음에 검색데이터베이스가 열리고 검색데이터베이스에 저장된 각 레코드를 읽어 각 레코드에 대하여 각 레코드의 예상검색문필드의 내용과 검색자가 입력한 검색문 사이의 유사율이 계산되고, 계산된 유사율은 각 레코드의 유사율필드에 입력되어 레코드는 업데이트된다. 다음에 업데이트된 각 레코드의 유사율필드의 유사율이 0보다 큰 레코드만이 선택되어 레코드세트가 구성된다. 다음에 구성된 레코드세트에서 예상검색문은 화면의 예상검색문표시부에 출력되고, 검색데이터베이스를 닫게 된다. 이때 예상검색문표시부에 출력되는 예상검색문은 검색자가 검색시의 타이핑의 수고를 덜어 주기 위하여, 예상검색문표시부의 어느 예상검색문을 클릭하였을 경우에 검색문이 검색문입력부로 이동하도록 링크되어 있다.4 is a flowchart illustrating a process of a searcher searching for an expected search sentence through a search database. As shown in FIG. 4, the searcher inputs a search sentence in a sentence form composed of a plurality of parts divided into parts according to the meaning of the search sentence input unit. The search statement data entered by the searcher is converted into internal data format. The search database is then opened, and each record stored in the search database is read, and for each record, the similarity rate is calculated between the contents of the expected search statement field of each record and the search statement entered by the searcher. The record is updated as it is entered in the similarity field. Next, only records in which the similarity rate in the similarity field of each updated record is greater than zero are selected to form a recordset. In the next configured record set, the prediction search statement is displayed in the prediction search statement display section of the screen, and the search database is closed. At this time, the predicted search output displayed in the predicted search display unit is linked to move the search sentence to the search sentence input unit when the searcher clicks on a predicted search sentence of the predicted search display unit in order to save typing effort in the search. have.

첨부된 도면 중 미설명한 도면 중 도 6은 검색데이터베이스를 구축하기 위하여 문서자료 기본 정보 및 예상검색문을 입력하기 위한 화면의 일 실시예를 도시한 도면이다. 도 6에 도시한 바와 같이 문서주소를 입력하기 위한 문서주소 입력부(10), 문서이름을 입력하기 위한 문서이름 입력부(11), 문서설명을 입력하기 위한 문서설명 입력부(12), 예상검색문을 입력하기 예상검색문 입력부(13)가 구비되어 있다. 예상검색문 입력부는 의미에 따라 부분으로 나누어 복수개의 부분으로 구성됨으로서 도 5에서 도시한 바와 같이 '언제, 어디서, 누가'에 해당하는 제 1부분과, '무엇을'에 해당하는 제 2부분과, '어떻게 하고'에 해당하는 제 3부분으로 나누어 구성되는 문장 형태로 예상검색문을 입력할 수 있도록 되어 있다.6 is a view illustrating an embodiment of a screen for inputting basic document information and an expected search text in order to construct a search database. As shown in FIG. 6, a document address input unit 10 for inputting a document address, a document name input unit 11 for inputting a document name, a document description input unit 12 for inputting a document description, and an expected search statement An expected search text input unit 13 is provided. The predictive search input unit is divided into parts according to meanings, and is composed of a plurality of parts. As shown in FIG. 5, a first part corresponding to 'when, where, and who', and a second part corresponding to 'what' and In addition, it is possible to input the expected search sentence in the sentence form which is divided into 3 parts corresponding to 'how and'.

도 7은 문서자료 기본 정보 및 예상 검색문이 검색데이터베이스에 입력된 결과를 확인할 수 있도록 입력결과를 표시하여 주는 화면의 일 실시예를 도시한 도면이다. 도 7에 도시한 바와 같이 검색데이터베이스에 1개의 레코드가 고유번호, 3부분으로 나누어지는 예상검색문, 문서자료주소, 문서이름, 문서설명, 유사율의 데이터를 갖도록 되어 있는 경우에는 각각 고유번호 필드(21), 예상검색문1 필드(22), 예상검색문2 필드(23), 예상검색문3 필드(24), 문서자료주소필드(25), 문서이름 필드(26), 문서설명 필드(27), 유사율 필드(28)가 구성되어 각각 해당 데이터가 입력되고 출력되는 것이다.FIG. 7 is a diagram illustrating an embodiment of a screen that displays input results so that basic document information and expected search statements can be inputted into a search database. As shown in FIG. 7, when a record has a unique number, an expected search statement divided into three parts, a document data address, a document name, a document description, and similarity data, each unique number field is provided in the search database. (21), Predictive Search 1 field (22), Predictive Search 2 field (23), Predicted Search 3 field (24), Document Data Address field (25), Document Name field (26), Document Description field ( 27), the similarity rate field 28 is configured so that the corresponding data is input and output.

도 8은 검색자가 검색문을 입력하기 위한 검색문입력부가 구비되어 있는 화면의 일 실시예를 도시한 도면이다. 도 8에 도시한 바와 같이 검색문입력부(31)는 의미에 따라 부분으로 나누어 복수개의 부분으로 구성됨으로서 도 6에서 도시한 바와 같이 '언제, 어디서, 누가'에 해당하는 제 1부분과, '무엇을'에 해당하는 제 2부분과, '어떻게 하고'에 해당하는 제 3부분으로 나누어 구성되는 문장 형태로 검색문을 입력할 수 있도록 되어 있다. 이때, 검색문입력부(31)에 각각 검색문을 입력한 후 검색문검색버튼(32)을 클릭하게 되면 입력된 검색문과 비교하여 유사율이 0보다 큰 예상검색문이 예상검색문표시부(34)에 출력되게 되고, 검색문입력부(31)에 각각 검색문을 입력한 후 검색버튼(33)을 클릭하게 되면 검색문과 비교하여 유사율이 0보다 큰 문서자료는 각 레코드의 유사율값을 비교하여 유사율 값의 크기에 따라서 유사율값이 큰 레코드에서 작은 레코드 순으로 정렬되어 출력되게 되는 것이다.FIG. 8 is a diagram illustrating an embodiment of a screen in which a searcher input unit for inputting a search sentence is provided. As shown in FIG. 8, the search sentence input unit 31 is divided into parts according to meanings, and thus, the first part corresponding to 'when, where, and who', as shown in FIG. The search sentence can be entered in a sentence form that is divided into a second portion corresponding to 'and a third portion corresponding to' how to do '. At this time, when the search sentence input unit 31, respectively, after entering the search sentence and click on the search sentence search button 32, compared with the input search sentence predicted sentence with a similarity ratio greater than 0 is the expected search sentence display unit 34 When the search text input unit 31 inputs the search text and clicks the search button 33, the document data having a similarity rate greater than 0 compared to the search text compares the similarity rate value of each record. According to the size of the rate value, the similarity value is output in the order of the smallest records from the largest records.

도 9는 검색자가 문서자료 검색을 한 후 출력되는 검색결과를 표시하여 주는 화면의 일 실시예를 도시한 도면이다. 도 9에 도시한 바와 같이 검색결과에는 검색데이터베이스 구축시에 입력한 예상 검색문, 문서이름, 문서자료주소, 문서설명이 출력되며 또한, 검색자가 입력한 검색문과 예상검색문과의 유사율을 계산한 값이 출력되어 검색자의 검색을 도와주도록 하는 것이다.FIG. 9 is a diagram illustrating an embodiment of a screen for displaying a search result output after a searcher searches for document data. As shown in FIG. 9, the search results, the document name, the document data address, and the document description inputted at the time of constructing the search database are outputted in the search result, and the similarity ratio between the search word input by the searcher and the expected search word is calculated. The value is output to help the searcher search.

도 10은 검색자가 예상검색문 검색을 한 후 출력되는 검색결과를 표시하여 주는 화면의 일 실시예를 도시한 도면이다. 도 10에 도시한 바와 같이 검색문입력부(31)에 각각 검색문을 입력하고 검색문검색버튼(32)을 클릭하였을 경우에 검색데이터베이스를 거쳐서 입력된 검색문과 비교하여 유사율이 0보다 큰 예상검색문이 예상검색문표시부(34)에 출력되는 것이다.FIG. 10 is a diagram illustrating an embodiment of a screen for displaying a search result output after a searcher searches for an expected search sentence. As shown in FIG. 10, when each search sentence is input to the search sentence input unit 31 and the search sentence search button 32 is clicked, the predicted search having a similarity rate greater than 0 compared to the search sentence input through the search database The statement is output to the expected search statement display unit 34.

도 11은 검색데이터베이스에 데이터를 입력하였을 때 입력 결과가 출력되는 화면의 일 실시예를 도시한 도면이다. 도 11에 도시한 바와 같이 검색데이터베이스 구축시에 입력하는 문서의 기본 정보와 예상검색문이 정확하게 입력되었는지를 확인할 수 있도록 입력결과를 출력하여 주는 것이다.FIG. 11 is a diagram illustrating an embodiment of a screen on which an input result is output when data is input to a search database. As shown in FIG. 11, the input result is outputted so that basic information of the document to be input at the time of constructing the search database and the expected search statement are correctly input.

또한, 상기에서 설명한 바와 같이 예상검색문과 검색문 사이의 유사율을 계산하는 방식은 여러가지 방식에 의하여 실시가 가능하다. 이하에서는 유사율을 계산하는 방식의 일 실시예를 설명한다.In addition, as described above, the method of calculating the similarity rate between the predicted search text and the search text can be implemented by various methods. Hereinafter, an embodiment of a method of calculating the similarity rate will be described.

문장 A가 '유용하고 쉬운 인터넷 검색을'이고, 문장 B가 '유용하고도 재미있는 검색을'일 경우에 문장 A와 문장 B의 유사율을 계산하는 방식은 문장 A의 '유용하고', '쉬운', '인터넷', '검색을'을 각각 하나의 단어로 계산하고, 문장 B의 '유용하고도', '재미있는', '검색을'을 각각 하나의 단어로 계산하여 다음의 식을 이용하여 계산한다.When sentence A is 'useful and easy internet search' and sentence B is 'useful and interesting search', the method of calculating the similarity rate between sentence A and sentence B is 'useful' and 'easy' Calculate ',' 'Internet', 'Search' as one word, and calculate 'Useful', 'Funny' and 'Search' as one word in sentence B. Calculate

문장 A의 단어수는 모두 4개이고, 문장 B의 단어수는 3개이므로 총 단어수는 모두 7개이다. 비교 대상 문장에 있는 단어의 수는 다음과 같다.The number of words in sentence A is four, and the number of words in sentence B is three, so the total number of words is seven. The number of words in the sentence to be compared is as follows.

문장 A 에서 문장 B의 부분문자열인 단어수는 2개 : 유용하고, 검색을The number of words that are substrings of sentence A in sentence B is two:

문장 B 에서 문장 A의 부분문자열인 단어수는 1개 : 검색을The number of words in sentence B that are substrings of sentence A is 1:

따라서, 총 match 단어수는 3개가 된다.Thus, the total number of match words is three.

따라서 문장 A와 문장 B의 유사율은Therefore, the similarity rate between sentence A and sentence B is

이 된다.Becomes

상기에서 설명한 유사율 계산 방식은 하나의 실시예에 불과하며, 유사율 계산은 다양한 방식에 의한 수학적 계산으로 계산될 수 있다.The similarity rate calculation method described above is just one embodiment, and the similarity rate calculation may be calculated by mathematical calculations using various methods.

이상에서 설명한 본 발명에 있어서 문서자료 검색 데이터베이스화 및 이를 이용한 문서자료 검색 방법은 상기에서 설명한 본 발명의 기술적 사상을 바탕으로 하여 인터넷상에 검색사이트를 구비시켜서 검색자들이 인터넷네트워크를 통하여 문서자료를 검색할 수 있는 형태로 실시가 가능하며, 또한 다른 유무선 통신수단을 통하여 검색자가 검색문을 입력하고 검색결과를 받아볼 수 있는 형태로도 가능하며, 독립된 문서자료 검색프로그램으로 구성하여 실시가 가능하다.In the present invention described above, the document data search database and the document data search method using the same are provided with search sites on the Internet based on the technical idea of the present invention described above. It can be implemented in the form that can be searched, and also in the form that a searcher can input the search text and receive the search result through other wired / wireless communication means, and can be implemented by forming an independent document data search program. .

본 발명의 기술적 사상을 바탕으로 하여 인터넷에 구축된 인터넷사이트 상의 문서 자료 검색을 위한 검색데이터베이스화 및 이를 이용한 검색방법에 실시가 가능하고, 또한 기업 또는 단체내에 구축되어 있는 인트라넷 상의 문서 자료 검색을위한 검색데이터베이스화 및 이를 이용한 검색방법에도 실시가 가능함을 밝혀둔다.Based on the technical idea of the present invention, a search database for searching document data on an Internet site constructed on the Internet and a search method using the same can be implemented, and for searching document data on an intranet established in an enterprise or a group. It turns out that it is possible to implement a search database and a search method using the same.

본 발명에 의하여 문서자료를 검색데이터베이스화하고 이를 이용한 문서자료를 검색함으로서, 검색자는 문서자료를 검색함에 있어서 검색어를 정하고, 검색식을 정하여 구성하는 불편함을 없애고, 자신이 의도하는 바를 일반 문장형태의 검색문으로 자연스럽게 검색문에 입력하여 문서자료를 검색함으로서 편리하게 문서자료를 검색할 수 있다는 이점이 있으며, 검색문을 문장형태로 의미에 따라 나누어 검색을 함으로서 같은 의미 대상에서만 검색이 이루어지므로 검색속도가 향상된다는 이점이 있으며, 검색자가 어떠한 검색문으로 검색해야할지 자세히 모를 경우에는 간단히 검색자의 의도만을 입력한 후 예상검색문을 검색함으로서 검색자의 의도와 유사한 예상검색문을 얻어 검색을 진행할 수 있다는 이점이 있는 것이다.According to the present invention, by searching the document data and searching the document data using the same, the searcher sets the search term in searching the document data, eliminates the inconvenience of configuring the search formula, and intends the general sentence form. This is an advantage that you can easily search the document data by searching the document data by naturally entering the search sentence of the search statement, and the search statement is searched only in the same meaning object by dividing the search statement according to the meaning in sentence form. If the searcher doesn't know in detail which search terms to search, the user can simply enter the searcher's intention and then search for the predicted search sentence to obtain the prediction search similar to the searcher's intention. Is there.

Claims

As a document data search database for searching document data,

Inputting document data basic information for inputting basic information about document data;

An expected search sentence input step of inputting an expected search sentence in a sentence form composed of a plurality of parts by extracting a search sentence expected in the corresponding document data input in the document data basic information input step;

A data conversion step of converting the data input in the document data basic information input step and the prediction search statement input step into an internal data format; And

And a search database input step of inputting the data converted in the data conversion step into a search database in a record form corresponding to each field of the search database. How to document search database to make it searchable.

The method according to claim 1, further comprising an input result output step of outputting an input result in order to confirm whether the input content is correctly input and built into the search database in the document data search database forming method. How to document search database.

The method of claim 1, wherein the document data basic information input in the document data basic information input step includes at least a name of document data, a description of document data, and an address of document data.

The method of claim 1, wherein the predicted search sentence input in the predicted search sentence input step includes a first part corresponding to 'when, where, and who', a second part corresponding to 'what', and 'how' Method for document search database, characterized in that divided into three parts corresponding to the.

The record of claim 1, wherein the record of the search database generated in the search database input step includes at least a unique number field, an expected search sentence field, a document address field, a document name field, a document description field, and a similarity rate field. Document data retrieval database method characterized in that the corresponding data is input.

As a method of searching document data using a search database databased by a document data search database method that enables searching of document data using an expected search statement,

A search sentence input step of the searcher inputting a search sentence in a sentence form composed of a plurality of parts divided into parts according to the meaning of the search sentence input unit;

A data conversion step of converting the search statement data input in the search statement input step into an internal data format;

Reads each record stored in the search database, calculates the similarity rate between the predicted search statement contents of each record and the search statement entered by the searcher, and enters the calculated similarity rate into the similarity field of each record. A similarity rate calculating step of updating a;

A record set construction step of constructing a recordset by selecting only records in which the similarity rate of the similarity rate field of each record updated in the similarity rate calculation step is greater than zero; And

And a recordset output step of outputting the contents of each record of the recordset configured in the recordset configuration step to the screen.

Reads each record stored in the search database, calculates the similarity rate between the predicted search statement contents of each record and the search statement entered by the searcher, and enters the calculated similarity rate into the similarity field of each record. Updating the record;

A record set construction step of constructing a recordset by selecting only records in which the similarity rate of the similarity rate field of each record updated in the record updating step is greater than zero; And

Document prediction data search to search the document data, characterized in that it comprises a step of outputting the expected search statement outputting the expected search statement in the predicted search sentence display unit on the screen set configured in the record set configuration step Search method.