KR20090024460A

KR20090024460A - Information retrieval apparatus and method for multi languages

Info

Publication number: KR20090024460A
Application number: KR1020070089498A
Authority: KR
Inventors: 김신강
Original assignee: 엘지전자 주식회사
Priority date: 2007-09-04
Filing date: 2007-09-04
Publication date: 2009-03-09

Abstract

An information searching device which is compatible with many languages and a method thereof are provided to search data stored in a database, efficiently according to the used language, country and the characteristic of a product without largely changing the structure of the searching device. A code conversion unit(100) converts a character string inputted for information search from a user into a preset character code. A keyword extracting unit(110) extracts a keyword having a meaning from the converted character string. A search word matching unit(120) matches the extracted keyword with data stored in a database(140) to output the matched result as the information search results. A sorting unit(130) sorts the information search results in a predetermined order so that the user can view the result.

Description

Information retrieval apparatus and method for multi-language compatible information retrieval apparatus and method

도 1은 본 발명에 따른 다국어 호환 가능한 정보 검색 장치의 개략적인 구성을 나타내는 블록도이다. 1 is a block diagram showing a schematic configuration of a multilingual compatible information retrieval apparatus according to the present invention.

도 2는 본 발명에 따른 정보 검색 장치의 구성에 대한 일실시예를 나타내는 블록도이다.2 is a block diagram illustrating an embodiment of a configuration of an information retrieval apparatus according to the present invention.

도 3은 본 발명에 따른 다국어 호환이 가능한 정보 검색 방법에 대한 일실시예를 나타내는 흐름도이다.3 is a flowchart illustrating an embodiment of a multilingual compatible information retrieval method according to the present invention.

본 발명은 데이터베이스(database)에 저장된 데이터를 통해 정보를 검색하는 장치 및 방법에 관한 것으로, 더욱 상세하게는 다국에 호환이 가능한 정보 검색 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for retrieving information through data stored in a database, and more particularly, to an apparatus and method for information retrieval compatible with multiple countries.

정보 검색은 많은 양의 정보를 가진 데이터베이스에서 필요에 따라 원하는 정보를 찾아내는 것을 말한다. 일반적으로 사용자가 검색을 위해 입력한 단순 문자열로부터 의미있는 부분을 추출하여, 그를 이용해 데이터베이스에 저장된 데이터를 검색한다.Information retrieval refers to finding the desired information as needed in a database with a large amount of information. In general, we extract meaningful parts from a simple string entered by a user for a search and use it to retrieve data stored in a database.

상기와 같은 정보 검색 시스템은 사용되는 언어, 국가 또는 제품에 따라 효율적인 구조 및 검색 알고리듬이 서로 상이하여, 시장 상황이 바뀔 때마다 시스템의 사양이 변경되어야하는 문제가 있었다.The information retrieval system as described above has a problem in that the efficient structure and the search algorithm are different from each other according to the language, the country or the product used, and the specification of the system must be changed whenever the market situation changes.

본 발명은 정보 검색 장치 및 방법에 있어, 사용되는 언어, 국가 또는 제품 등 사용 환경에 맞추어 효율적으로 동작하는 다국어 호환이 가능한 정보 검색 장치 및 방법을 제공하는 것을 목적으로 한다.An object of the present invention is to provide a multilingual compatible information retrieval apparatus and method that efficiently operates in accordance with a usage environment such as a language, a country, or a product to be used.

상기한 기술적 과제를 해결하기 위한 본 발명에 의한 정보 검색 장치는, 입력된 문자열을 다국어 호환이 가능한 소정의 문자 코드로 변환하는 코드변환부; 변환된 문자열로부터 키워드를 추출하는 키워드추출부; 데이터를 저장하는 데이터베이스; 및 추출된 키워드와 데이터베이스에 저장된 데이터를 매칭시켜 정보 검색 결과로 출력하는 검색어매칭부를 포함하는 것을 특징으로 한다.An information retrieval apparatus according to the present invention for solving the above technical problem, the code conversion unit for converting the input character string into a predetermined character code compatible with multiple languages; A keyword extraction unit for extracting a keyword from the converted string; A database for storing data; And a search term matching unit that matches the extracted keyword with data stored in the database and outputs the search result as an information search result.

상기한 기술적 과제를 해결하기 위한 본 발명에 의한 정보 검색 방법은, 입력된 문자열을 다국어 호환이 가능한 소정의 문자 코드로 변환하는 단계; 변환된 문자열로부터 키워드를 추출하는 단계; 추출된 키워드와 데이터베이스에 저장된 데이터를 매칭시켜 정보 검색 결과로 출력하는 단계를 포함하는 것을 특징으로 한다.The information retrieval method according to the present invention for solving the above technical problem comprises the steps of: converting the input character string into a predetermined character code compatible with multiple languages; Extracting a keyword from the converted string; And matching the extracted keyword with data stored in the database and outputting the result as an information search result.

상기 정보 검색 방법은 바람직하게는 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체로 구현할 수 있다.The information retrieval method may be embodied as a computer-readable recording medium recording a program for execution on a computer.

이하, 첨부된 도면을 참조하여 본 발명에 따른 플라즈마 디스플레이 장치에 관하여 상세히 설명한다. 도 1은 본 발명에 따른 다국어 호환 가능한 정보 검색 장치의 개략적인 구성을 블록도로 도시한 것으로, 도시된 정보 검색 장치는 코드변환부(100), 키워드추출부(110), 검색어매칭부(120), 정렬부(130) 및 데이터베이스(140)를 포함하여 이루어진다.Hereinafter, a plasma display device according to the present invention will be described in detail with reference to the accompanying drawings. 1 is a block diagram illustrating a schematic configuration of a multilingual compatible information retrieval apparatus according to the present invention. The illustrated information retrieval apparatus includes a code conversion unit 100, a keyword extraction unit 110, and a keyword matching unit 120. , The alignment unit 130 and the database 140.

도 1을 참조하면, 코드변환부(100)는 정보 검색을 위해 사용자로부터 입력되는 문자열을 미리 설정된 문자 코드로 변환한다. 코드변환부(100)는 상기 입력 문자열을 다국어 호환이 가능한 통일된 문자 코드로 변환하는 것이 바람직하다.Referring to FIG. 1, the code conversion unit 100 converts a string input from a user into a predetermined character code for information retrieval. The code conversion unit 100 preferably converts the input string into a unified character code compatible with multiple languages.

데이터베이스(140) 정보 검색을 상기 다국어 호환이 가능한 문자 코드에 기반하여 수행되도록 함으로써, 본 발명은 다국어 호환이 가능한 정보 검색 장치의 제공이 가능할 수 있다.By performing the database 140 information retrieval based on the multi-language compatible character code, the present invention can provide a multi-language compatible information retrieval apparatus.

키워드추출부(110)는 상기 변환된 문자열로부터 의미있는 키워드를 추출한다. 상기 키워드는 데이터를 검색에 있어 특정한 내용이 들어 있는 정보를 찾기 위하여 사용하는 단어나 기호를 말한다.The keyword extraction unit 110 extracts a meaningful keyword from the converted string. The keyword refers to a word or symbol used to search for information containing specific contents in searching data.

검색어매칭부(120)는 상기 추출된 키워드와 데이터베이스(140)에 저장된 데이터를 매칭시켜, 상기 매칭된 데이터를 정보 검색 결과로 출력한다.The search term matching unit 120 matches the extracted keyword with data stored in the database 140 and outputs the matched data as an information search result.

정렬부(130)는 상기 정보 검색 결과를 소정의 순서에 따라 정렬하여 사용자가 볼 수 있도록 표시한다.The sorter 130 sorts the information search results in a predetermined order and displays the information for viewing by a user.

도 1에 도시된 본 발명에 따른 정보 검색 장치의 동작에 대해, 도 3에 도시된 본 발명에 따른 정보 검색 방법에 대한 일실시예를 나타내는 흐름도와 결부시켜 상세히 설명하기로 한다.The operation of the information retrieval apparatus according to the present invention shown in FIG. 1 will be described in detail with reference to a flowchart showing an embodiment of the information retrieval method according to the present invention shown in FIG.

코드변환부(100)는 입력 문자열을 다국어 호환이 가능한 문자 코드, 예를 들어 유니 코드(unicode)로 변환한다(300 단계). 문자 코드(Character Code)는 문자들의 집합과 이 문자들을 나타내기 위해 정한 숫자들을 1대 1로 연결시켜 놓은 것을 말한다.The code conversion unit 100 converts the input string into a multilingual compatible character code, for example, Unicode (step 300). A character code is a one-to-one concatenation of a set of characters and the numbers specified to represent them.

상기 문자 코드는 사용되는 언어 또는 국가에 따라 다양하게 존재하며, 문자 인코딩(Character Encoding)에 의해 컴퓨터가 이해할 수 있는 0과 1의 바이너리 값을 가지는 연속적인 비트 형태로 매핑(mapping)된다.The character code exists in various ways depending on the language or country used, and is mapped to a continuous bit form having a binary value of 0 and 1 that can be understood by a computer by character encoding.

ASCII 인코딩 체계는 7개의 비트를 사용하여 총 128개의 문자까지만 표현할 수 있다. ISO-8859-1 인코딩 체계는 서부 유럽권 국가에서 사용하는 문자들을 기존 ASCII 문자 집합에 포함시켜 만들어진 새로운 문자 집합으로서, ASCII의 확장으로 ASCII 문자 코드가 채택한 7비트 코드 체계로 다 수용할 수가 없기 때문에 8비트(1바이트)코드 체계를 사용한다.The ASCII encoding scheme can only represent a total of 128 characters using seven bits. The ISO-8859-1 encoding scheme is a new character set created by including the characters used in Western European countries in the existing ASCII character set, which cannot be accommodated by the 7-bit code system adopted by the ASCII character code as an extension of ASCII. Use a bit (1 byte) code system.

국가별로 사용되는 대표적 문자 코드는 다음과 같다. 유럽의 경우 ISO 8859 계열, ISO 6937을 사용하며, 중동의 경우 ISO8859 계열을 사용하며, 중국의 경우 GB2312-80, GBK, BIG5를 사용하고, 일본은 JIS, 한국은 KSX 1001과 같은 자국 문자 코드를 사용한다. Representative character codes used by countries are as follows. In Europe, ISO 8859 series and ISO 6937 are used.In the Middle East, ISO8859 series is used.In China, GB2312-80, GBK, and BIG5 are used.Japan uses JIS and Korea KSX 1001. use.

유니 코드(unicode)는 ASCII의 한계 및 세계 모든 언어에 대한 호환이 가능하도록 도입된 것으로, 인간이 사용하는 모든 언어를 표현할 수 있도록 국제화를 염두에 두고 설계된 문자 코드이며 기존 언어의 인코딩 체계를 모두 포함할 수 있 도록 고안된 커다란 단일 문자 집합이다.Unicode is a character code designed with internationalization in mind to represent all the languages used by humans. It is designed to be compatible with all languages of the world. It is a large single character set designed to be able.

코드변환부(100)는 상기한 바와 같이 정보 검색 장치가 사용되는 언어 또는 국가 등에 따라 서로 상이한 입력 문자열의 문자 코드를 다국어 호환이 가능한 통일 문자 코드, 예를 들어 유니 코드로 변환한다.As described above, the code converting unit 100 converts the character codes of the input strings different from each other according to the language or country in which the information retrieval apparatus is used, into a unified character code that is multilingual compatible, for example, Unicode.

키워드추출부(110), 검색어매칭부(120) 및 정렬부(130)의 데이터베이스 검색을 위한 일련의 동작들은 상기 변환된 다국어 호환이 가능한 통일 문자 코드 및 문자 인코딩, 예를 들어 유니 코드에 기반하여 이루어지는 것이 바람직하다.A series of operations for searching the database of the keyword extraction unit 110, the search term matching unit 120, and the alignment unit 130 may be based on the converted multilingual compatible character code and character encoding, for example, Unicode. It is preferable to make.

또한, 코드변환부(100)는 유니 코드 이외에 사용되는 언어 또는 국가에 맞는 자국 문자 코드(local character code)도 호환하여, 입력되는 문자열을 자국 문자 코드(local character code)로 변환할 수도 있다.In addition, the code conversion unit 100 may also convert a character string input to a local character code by being compatible with a local character code suitable for a language or country used in addition to Unicode.

키워드추출부(110)는 상기 변환된 문자열로부터 의미있는 키워드를 추출한다(310 단계). 사용되는 언어, 국가 또는 제품 등 사용 환경에 따라 효율적인 키워드 추출 알고리듬 및 방법이 다를 수 있으므로, 키워드추출부(110)는 복수의 키워드 추출 알고리듬 및 방법 중 어느 하나를 사용 환경에 따라 적응적으로 적용하는 것이 바람직하다. 즉, 키워드추출부(110)는 사용 환경에 따라 적용되는 키워드 추출 알고리듬 및 방법에 대한 정보를 복수 개 포함하고, 사용 환경에 기초하여 복수의 키워드 추출 알고리듬 및 방법들 중 어느 하나를 선택하여, 상기 선택된 알고리듬 및 방법을 이용하여 입력 문자열로부터 키워드를 추출할 수 있다.The keyword extraction unit 110 extracts a meaningful keyword from the converted character string (step 310). Since the effective keyword extraction algorithm and method may vary depending on the language, country or product used, the keyword extraction unit 110 may apply one of a plurality of keyword extraction algorithms and methods adaptively according to the usage environment. It is preferable. That is, the keyword extraction unit 110 includes a plurality of information on the keyword extraction algorithm and method applied according to the use environment, and selects any one of the plurality of keyword extraction algorithms and methods based on the use environment, The selected algorithm and method can be used to extract keywords from the input string.

다음의 표 1은 키워드 추출 알고리듬 및 방법에 대한 실시예들을 나타낸 것으로, 검색을 위해 사용자로부터 입력된 문자열이 "생방송 심야토론"인 경우이다.Table 1 below shows embodiments of the keyword extraction algorithm and method, in which the string input from the user for searching is "live broadcast late-night talk".

키워드추출부(110)는 상기한 바와 같은 복수의 키워드 추출 알고리듬 및 방법들을 호환할 수 있다.The keyword extraction unit 110 may be compatible with a plurality of keyword extraction algorithms and methods as described above.

검색어매칭부(120)는 상기 추출된 키워드와 데이터베이스(140)에 저장된 데이터를 매칭시켜, 상기 매칭된 데이터를 정보 검색 결과로 출력한다(320 단계). 즉, 검색어매칭부(120)는 데이터베이스(140)에 저장된 데이터 중 상기 키워드에 대응될 수 있는 데이터를 검출한다.The search term matching unit 120 matches the extracted keyword with data stored in the database 140 and outputs the matched data as an information search result (step 320). That is, the keyword matching unit 120 detects data that may correspond to the keyword among the data stored in the database 140.

예를 들어, 검색어매칭부(120)는 상기 추출된 키워드들을 모두 포함하는 데이터를 정보 검색 결과로 출력하거나, 상기 추출된 키워드들 중 적어도 하나를 포함하는 데이터를 정보 검색 결과로 출력하거나, 상기 추출된 키워드를 구성하는 문자열을 포함하는 데이터 또는 상기 문자열의 역순으로 구성되는 문자열을 포함하는 데이터를 정보 검색 결과로 출력할 수 있다.For example, the search term matching unit 120 outputs data including all of the extracted keywords as an information search result, outputs data including at least one of the extracted keywords as an information search result, or extracts the data. The data including the character string constituting the keyword can be output as an information search result.

검색어매칭부(120)는 상기한 바와 같은 다양한 검색어 매칭 알고리듬 및 방법들을 지원 가능하도록 하는 것이 바람직하며, 복수의 키워드 추출 알고리듬 및 방법들 중 가장 효율적인 것을 사용되는 언어, 국가 또는 제품 등 사용 환경에 따라 선택하여 적용할 수 있다.The search term matching unit 120 may preferably support various search term matching algorithms and methods as described above, and the most efficient of the plurality of keyword extraction algorithms and methods may be used according to a language, a country, or a product used environment. Can be selected and applied.

다음의 표 2는 검색어 매칭 알고리듬 및 방법에 대한 실시예들을 나타낸 것으로, 검색을 위해 사용자로부터 입력된 문자열이 "심야토론"이며, 데이터베이스(140)에 저장된 데이터가 "24시 토론마당", "MBC 심야토론", "KBS 금요 토론 마당"인 경우이다.Table 2 below shows embodiments of a search term matching algorithm and method, wherein a string input from a user for a search is a late night discussion, and data stored in the database 140 is a "24 hour discussion field" or "MBC." It's a late night talk, "" KBS Friday Discussion Forum. "

검색어매칭부(120)는 상기한 바와 같은 복수의 검색어 매칭 알고리듬 및 방법들을 지원할 수 있다.The search term matching unit 120 may support a plurality of search term matching algorithms and methods as described above.

정렬부(130)는 상기 정보 검색 결과를 소정의 순서에 따라 정렬하여 사용자가 볼 수 있도록 표시한다(330 단계). 정렬부(130)는 사용되는 언어, 국가 또는 제품 등 사용 환경에 따라 사용자가 가장 편하게 느끼는 순서로 상기 정보 검색 결과들을 정렬하는 것이 바람직하다.The sorting unit 130 sorts the information search results in a predetermined order and displays them for viewing by the user (step 330). The sorting unit 130 may sort the information search results in the order that the user feels most comfortable according to the language, country or product used.

그를 위해, 정렬부(130)는 다양한 정렬 순서들을 지원 가능하도록 하는 것이 바람직하며, 복수의 정렬 순서들 중 가장 효율적인 것을 사용되는 언어, 국가 또는 제품 등 사용 환경에 따라 선택하여 적용할 수 있다.For this purpose, the sorting unit 130 may be capable of supporting various sorting orders, and the most efficient of the plurality of sorting orders may be selected and applied according to a usage environment such as a language, a country, or a product used.

유니 코드의 경우 코드 값에 따른 정렬 순서가 알파벳 순서와 일치하지 아니하고, 코드 값에 따라 정해지 언어, 예를 들어 Latin Basic/Extended, Greek and Coptic, Cyrillic and Cyrillic Supplement, Armenian 등 각 언어의 정해진 순서를 따른다.For Unicode, the order of sorting by code value does not match the alphabetical order, but is determined by the code value. Follow.

도 2는 본 발명에 따른 정보 검색 장치의 구성에 대한 일실시예를 블록도로 도시한 것으로, 도시된 정보 검색 장치는 코드변환부(200), 키워드추출부(210), 검색어매칭부(220), 정렬부(230), 데이터베이스관리시스템(DataBase Management System, DMBS, 240), 데이터베이스검색부(250) 및 데이터베이스(260)를 포함하여 이루어진다. 도 2에 도시된 정보 검색 장치의 동작 중 도 1 및 도 2를 참조하여 설명한 동작과 동일한 것에 대해서는 생략하기로 한다.2 is a block diagram illustrating an embodiment of the configuration of an information retrieval apparatus according to the present invention. The illustrated information retrieval apparatus includes a code conversion unit 200, a keyword extraction unit 210, and a keyword matching unit 220. And a sorting unit 230, a database management system (DMBS) 240, a database searching unit 250, and a database 260. The same operations as those described with reference to FIGS. 1 and 2 among the operations of the information retrieval apparatus shown in FIG. 2 will be omitted.

데이터베이스관리시스템(DMBS, 240)은 데이터베이스(260)에 저장된 수많은 데이터들을 쉽고 빠르게 추가, 삭제, 검색 등 관리할 수 있도록 해주는 시스템으로서, 다수의 사용자들이 데이터베이스(260) 안에 데이터를 기록하거나 접근할 수 있게 해주는 프로그램이다.The database management system (DMBS) 240 is a system for managing a large number of data stored in the database 260 easily and quickly to add, delete, search, and the like. A plurality of users can record or access data in the database 260. Is a program that allows

데이터베이스관리시스템(DMBS, 240)은 응용 소프트웨어별로 흩어져 있는 자료들을 통합하고 통합된 자료들을 각 응용소프트웨어가 공유하여 정보의 체계적인 활용을 가능하게 한다.The database management system (DMBS) 240 integrates data scattered by application software and enables the systematic utilization of information by sharing the integrated data with each application software.

데이터베이스관리시스템(DMBS, 240)은 i) 축적된 자료구조의 정의, ii) 자료구조에 따른 자료의 축적, iii) 데이터베이스 언어에 의한 자료 검색 및 갱신 iv) 복수 사용자로부터 자료처리의 동시실행제어, v) 갱신 중에 이상이 발생했을 때 갱신 이전의 상태로 복귀, vi) 정보의 기밀보호(security) 등의 역할을 할 수 있다.The database management system (DMBS) 240 includes: i) definition of accumulated data structures, ii) accumulation of data according to data structures, iii) retrieval and updating of data by database language, iv) simultaneous execution control of data processing from multiple users, v) When an abnormality occurs during the update, it can return to the state before the update, vi) It can play a role of security of information.

검색어매칭부(220)가 키워드에 매칭되는 데이터베이스(260)에 저장된 데이터를 정보 검색 결과로 출력할 수 있도록, 데이터베이스관리시스템(DMBS, 240)은 검색어매칭부(220)의 데이터베이스(260)로의 접근을 관리한다.The database management system (DMBS) 240 accesses the database 260 of the search term matching unit 220 so that the search term matching unit 220 outputs data stored in the database 260 matching the keyword as an information search result. Manage it.

도 2에 도시된 바와 같이, 본 발명에 따른 정보 검색 장치는 사용되는 언어, 국가 또는 제품 등의 사용 환경에 따라 다양한 검색 방법을 제공하기 위해, 데이터베이스관리시스템(DMBS, 240) 중 특정 부분, 예를 들어 데이터베이스검색부(250)를 독립적으로 구성하여 포함할 수 있다.As shown in FIG. 2, the information retrieval apparatus according to the present invention provides a specific retrieval method according to a usage environment such as a language, a country or a product used, and a specific portion of a database management system (DMBS) 240. For example, the database search unit 250 may be configured independently.

상기한 바와 같이 데이터베이스검색부(250)를 독립적으로 구성함으로써, 사용되는 데이터 관리 시스템의 종류에 따라 정보검색부(250)가 영향을 받지 않도록 할 수 있다.By independently configuring the database search unit 250 as described above, the information search unit 250 may be prevented from being affected by the type of data management system used.

또한, 본 발명에 따른 정보 검색 장치는 상위 모듈(module)에서 사용하기 쉽도록 구현된 메인 어플리케이션(main application)을 포함할 수 있다.In addition, the information retrieval apparatus according to the present invention may include a main application implemented to be easy to use in an upper module.

본 발명은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다.The invention can also be embodied as computer readable code on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disks, optical data storage devices, and the like, which are also implemented in the form of carrier waves (for example, transmission over the Internet). Include.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In addition, although the preferred embodiment of the present invention has been shown and described above, the present invention is not limited to the specific embodiments described above, but the technical field to which the invention belongs without departing from the spirit of the invention claimed in the claims. Of course, various modifications can be made by those skilled in the art, and these modifications should not be individually understood from the technical spirit or the prospect of the present invention.

상기한 바와 같은 본 발명에 따른 정보 검색 장치 및 방법에 의하면, 정보 검색을 위해 사용자로부터 입력되는 문자열을 미리 설정된 다국어 호환이 가능한 문자 코드로 변환하여 데이터베이스 검색에 이용하고, 제품 시장의 특성에 따라 키워드 추출, 검색어 매칭 방법 등을 다양하게 제공함으로써, 검색 장치의 구조를 크게 변경하지 아니하고 사용 언어, 국가 및 제품의 특성에 따라 효율적으로 데이터베이스에 저장된 데이터를 검색할 수 있다.According to the information retrieval apparatus and method according to the present invention as described above, a character string input from a user for information retrieval is converted into a preset multilingual compatible character code and used for database retrieval, and keywords according to characteristics of a product market By providing a variety of extraction, search term matching methods, etc., it is possible to efficiently retrieve data stored in a database according to the language, country, and product characteristics without significantly changing the structure of the search apparatus.

Claims

In the multilingual compatible information retrieval device,

A code conversion unit for converting the input character string into a predetermined character code compatible with multiple languages;

A keyword extraction unit for extracting a keyword from the converted character string;

A database for storing data; And

And a search term matching unit which matches the extracted keyword with data stored in the database and outputs an information search result.

The method of claim 1,

And said predetermined character code is unicode.

The method of claim 1, wherein the keyword extraction unit

And extracting the keyword by using dictionary information.

The method of claim 1, wherein the keyword extraction unit

At least one of a plurality of keyword extraction methods, and extracting a keyword using the selected method.

The method of claim 1, wherein the search term matching unit

Outputting data including all of the extracted plurality of keywords, outputting data including at least one of the extracted plurality of keywords, or outputting data including a string of the extracted keywords in reverse order; Search device.

The method of claim 1, wherein the search term matching unit

And selecting at least one of a plurality of search word matching methods and matching the keyword with data using the selected method.

The method of claim 6, wherein the search term matching unit

And select a search word matching method to be used based on at least one of a language, a country, and a product used.

The method of claim 1,

And a sorting unit for determining a sorting order based on at least one of a language, a country, and a product used, and sorting the information search results according to the determined sorting order.

In the multilingual compatible information retrieval method,

Converting the input character string into a predetermined character code compatible with multiple languages;

Extracting a keyword from the converted string;

And matching the extracted keyword with data stored in a database and outputting the result as an information search result.

The method of claim 9,

And said predetermined character code is unicode.

The method of claim 9, wherein the keyword extraction step

Selecting at least one of a plurality of keyword extraction methods based on at least one of a language, a country and a product used; And

Extracting a keyword from the converted character string using the selected method.

10. The method of claim 9, wherein the search term matching step is

Selecting at least one of a plurality of search term matching methods based on at least one of a language, a country and a product used; And

And matching the keyword with data using the selected method.

The method of claim 9,

Determining a sorting order based on at least one of a language, a country, and a product used; And

And sorting the information search results according to the determined sort order.

A computer-readable recording medium having recorded thereon a program for executing the method according to any one of claims 9 to 13.