KR970010030B1

KR970010030B1 - Picture search system

Info

Publication number: KR970010030B1
Application number: KR1019930029617A
Authority: KR
Inventors: 전미선; 진청희; 박세영; 최동시; 송경준; 천대녕
Original assignee: 양승택; 한국전자통신연구원
Priority date: 1993-12-24
Filing date: 1993-12-24
Publication date: 1997-06-20
Also published as: KR950020255A

Abstract

User question is inputted to system through keyboard(10), and the words question of user inputted through keyboard input unit(10) is transferred to microprocessor(20). According to the question words transferred microprocessor(20) abstracts index words from all sorts of dictionary written down to a large amount of storage(30) to move it to main memory(40). Microprocessor(20) computes the frequency for each index words, based on that, it decides the prior ranking of index words to arrange index words by prior ranking decided. Detected result outputs the photo data recorded to a large amount of storage(30) to high resolution screen(50).

Description

Photo retrieval system

제1도는 본 발명을 실현하기 위한 하드웨어시스템의 구성도.1 is a block diagram of a hardware system for implementing the present invention.

제2도는 본 발명을 실현하기 위한 소프트웨어시스템의 구성 및 데이터처리의 흐름을 나타낸 도면.2 is a diagram showing the configuration and data processing flow of a software system for realizing the present invention.

제3도는 사진검색시스템의 초기화면.3 is the initial screen of the photo retrieval system.

제4도는 아이콘 검색부 화면의 일부.4 is a part of the icon search unit screen.

제5도는 자연어 질의 입력부의 화면.5 is a screen of the natural language query input unit.

제6a도 내지 제6g도는 본 발명에 따른 사진검색시스템의 각 구성요소의 기능수행과정을 나타낸 흐름도.6A to 6G are flowcharts showing the function performance of each component of the photo retrieval system according to the present invention.

*도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

10 : 키보드 20 : 마이크로프로세서10: keyboard 20: microprocessor

30 : 대용량 저장장치 40 : 메인메모리30: mass storage device 40: main memory

50 : 화면출력장치 60 : 마우스50: screen output device 60: mouse

본 발명은 PC급의 컴퓨터 내에 저장되어 있는 대용량의 사진 데이터를 사용자가 자연어 질의를 컴퓨터에 입력하여 원하는 사진, 또는 그것의 관련자료를 검색할 수 있게 하는 시스템에 관한 것이다.The present invention relates to a system that allows a user to input a natural language query into a computer and search for a desired picture or related data thereof with a large amount of picture data stored in a PC-class computer.

소위 정보화 사회라고 일컬어지고 있는 오늘날 컴퓨터를 사용하는 사람들의 요구가 점점 더 많아지고 있음은 물론 매우 다양해지고 있는 추세이며, 사회의 발전과 더불어 매우 방대한 양의 데이터를 처리하여야 할 필요가 강하게 대두되고 있는 실정이다.In today's so-called information society, computer users are becoming more and more demanding as well as the demand for computer users, and with the development of society, the need to deal with huge amounts of data is rising. It is true.

따라서 대용량의 정보를 효율ㅈ덕으로 관리하여 사용자에게 최대한 편리를 제공하고자 하는 목적으로 데이터 베이스 관리시스템이 이미 개발되어 실용화되고 있다.Therefore, a database management system has already been developed and put into practical use for the purpose of providing a user with the greatest convenience by efficiently managing large amounts of information.

대부분의 기업체와 사회 각 단체는 사회의 발전에 따라 요청되고 있는 방대한 크기의 데이터 베이스를 관리하고 조작하는데 많은 노력을 경주하고 있는 실정이다.Most corporations and societies have made great efforts in managing and manipulating the huge size of databases requested by society.

현재에 접어들면서 방대한 양의 영상정보를 처리하는 분야의 발전이 급속하게 이루어졌다.In the present, the development of the field of processing a large amount of image information has been made rapidly.

그러나 영상정보의 검색시 사용자의 대화를 위한 질의어들인 SQL(Structured Query Language), PSQL(Pictorial Structured Query Language), QPE(Query-by-Pictorial-Example), 그리고 QBE(Query-By-Example)과 같은 것은 사용자의 입장에서 사용하기 어려울 뿐만 아니라 질의어의 문법을 기억하고 있어야 하는 단점이 있다.However, queries such as Structured Query Language (SQL), Pictorial Structured Query Language (PSQL), Query-by-Pictorial-Example (QPE), and Query-By-Example (QBE), which are queries for user dialogue when retrieving image information This is not only difficult to use from the user's point of view, but also has a disadvantage of remembering the grammar of the query.

이러한 점을 개선하기 위해 인간의 컴퓨터간의 원활한 통신수단의 확보는 매우 중요한 문제이다.In order to improve this point, it is very important to secure smooth communication means between human computers.

이 문제의 해결을 위하여 사용의 중심이 되는 인간의 행동특성을 검토하고, 가능하다면 인간의 5감의 기능을 최대한 활용할 수 있는 방향의 연구가 필요할 것이다.In order to solve this problem, it is necessary to examine the characteristics of human behavior, which is the center of use, and, if possible, to study how to make the most of the functions of human senses.

일반적으로 사람의 컴퓨터에 대해 두려움이나 불편함을 느끼고 있으므로 미래 정보의 실용화와 획득을 위해서 가장 먼저 떠오르는 효율적인 사용자 인터페이스 방법이 사용자에게 어떤 학습도 요구하지 않는 일상 언어를 가지고도 정보를 검색하는 것이다.In general, the user's computer feels fear or discomfort, so the first and most efficient user interface method to come up for practical use and acquisition of future information is to search for information with everyday language that does not require any learning from the user.

이에 자연언어 인터페이스를 이용한 정보검섹의 필요성이 절실히 요구된다. 자연연어를 이용한 사용자 인터페이스의 개발은 국외에서 주로 데이터 베이스의 전위시스템(front-end-system)으로서 개발되어 왔으며 현재는 상용화된 시스템들도 등장하고 있다.Therefore, there is an urgent need for information inspection using natural language interfaces. The development of the user interface using natural salmon has been developed mainly as a front-end-system of a database abroad, and now commercially available systems have appeared.

그러나 국 내에서의 자연어 인터페이스 개발은 아직 체계적으로 성취되지 못하고 있는 상태이다.However, the development of natural language interface in Korea has not been achieved systematically yet.

현재까지 많이 사용되고 있는 데이터 베이스 시스템이나 자료검색 시스템에서의 각각의 시스템에 종속되어 있는 자료구조 그리고 일반적인 질의어 방법이 아닌 정형화된 방법을 이용해서만이 저장되어 있는 데이터를 검색할 수 있게 한다.It is possible to search the stored data only by using the formal structure instead of the general query method and the data structure dependent on each system in the database system or the data retrieval system which is widely used to date.

이러한 이유로 일반적인사용자들이 처음 설치되는 데이터 베이스 시스템이나 자료검색 시스템을 사용하기 위해서는 장기간의 훈련기간과 이에 따르는 비용이 소요된다.For this reason, it takes a long period of training and subsequent costs to use a database system or a data retrieval system that is installed by ordinary users.

본 발명은 인간이 사용하는 언어(즉, 자연어)에 의해 저장되어 있는 데이터를 검색할 수 있게 함으로써 컴퓨터에 대한 특별한 지식이 없는 사용자라도 쉽게 접근하여 시스템을 사용할 수 있도록 하는 시스템을 제공하는데 그 목적이 있다.SUMMARY OF THE INVENTION The present invention provides a system that makes it possible to retrieve data stored by a language used by a human being (ie, a natural language) so that a user without a special knowledge of a computer can easily access and use the system. have.

본 발명에서는 사용자가 원하는 정보를 효율적으로 검색할 수 있도록 하기 위해 자연어 질의를 해석하여 정보 검색 시스템의 색인어가 되는 어구(TERM)를 추출한다.In the present invention, in order to enable the user to efficiently search for the desired information, a natural language query is analyzed to extract a phrase (TERM), which is an index word of the information retrieval system.

또한, 본 발명에서는 정확한 검색에 필요한 불용어 사전, 전기어 가전 및 유사어 사전을 구축하고, 자연어 인터페이스 이외의 룸 메타포(Room Metaphor)를 기반으로 하는 그래픽을 주로 사용하는 메뉴방식 인터페이스도 부가적으로 포함한다.In addition, the present invention additionally includes a menu-type interface for constructing a stopword dictionary, an electronic home appliance, and a similar word dictionary necessary for accurate search, and mainly using a graphic based on a room metaphor other than the natural language interface. .

여기서 룸 메타포란 주변에서 흔히 볼수 있는 시계, 계산기, 프린터, 메모리, 책 등의 사무실 환경을 은유적으로 표현한 말이다.Here, the room metaphor is a metaphor for office environments such as clocks, calculators, printers, memories, books, etc., which are commonly found around.

본 발명에서는 저장된 각 사진의 분류(대부분, 중분류, 소분류)를 그림으로 표현한 아이콘(icon)을 이용하여 계층적인 그래픽 검색 인터페이스도 또한 도입한다.The present invention also introduces a hierarchical graphical retrieval interface using graphically represented icons of the classification (mostly, mid-class, sub-class) of each photo stored.

제1도는 본 발명에 따른 사진 검색시스템을 실현하기 위한 하드웨어의 구성을 나타낸 것으로, 각 구성요소의 작용에 대해 설명하면 다음과 같다.1 is a block diagram of a hardware for realizing a photo retrieval system according to the present invention. The operation of each component is as follows.

사용자 질의는 키보드(10)를 통하여 시스템으로 입력되고, 키보드 입력장치(10)를 통하여 입력된 사용자의 자연어 질의는 마이크로 프로세서(20)로 전달된다.The user query is input to the system through the keyboard 10, and the natural language query of the user input through the keyboard input device 10 is transmitted to the microprocessor 20.

전달되어진 질의어에 따라서 마이크로 프로세서(20)는 대용량 저장장치(30)에 미리 수록되어 있는 각종 사전들을 바탕으로 색언어를 추출하여 메이메모리(40)에 위치시킨다.According to the transmitted query word, the microprocessor 20 extracts a color language based on various dictionaries pre-stored in the mass storage device 30 and places the color language in the may memory 40.

다시 마으크로 프로세서(20)는 각각의 색인어들에 대한 빈도수를 계산하고 빈도수에 따라서 색인어들의 우선순위를 결정하고 결정된 우선 순위별로 색인어들을 정렬시킨다. 이때 빈도수 계산으 고속실행을 위하여 모든 색인어들은 메인메모리(40)에 위치된다.Again, the processor 20 calculates a frequency for each index word, determines the priority of the index words according to the frequency, and sorts the index words by the determined priority. At this time, all index words are located in the main memory 40 for high-speed execution of frequency calculation.

이렇게하여 검출된 결과들은 대용량 저장장치(30)에 수록된 사진데이타들을 고해상도 화면(50)에 출력하기 위하여 사용된다.The results detected in this way are used to output the photo data contained in the mass storage device 30 on the high resolution screen 50.

사진데이터는 특정 압축 알고리즘에 의해 압축된 형태로 대용량 저장장치(30)에 저장되고, 이 데이터는 마이크로 프로세서(20)에 의해 압축되기 이전에 데이터 형태로 복원되어 고해상도 화면(50) 상에 본래의 사진형태로 재생된다.The photographic data is stored in the mass storage device 30 in a compressed form by a specific compression algorithm, and the data is restored to a data form before being compressed by the microprocessor 20 to be displayed on the high resolution screen 50. The picture is reproduced.

사융자는 마우스 입력장치(60)를 통하여 시스템 내에서의 이동 및 객체의 선택 그리고 사진데이타의 추출 등을 실행할 수 있다.The four loans can be moved through the mouse input device 60, selection of objects, extraction of photo data, and the like.

제2도는 본 발명에 따라 자연어 질의를 통해 사진 검색을 수행하는 시스템ㅂ의 기능실행수단들의 구성과 각 수단들의 기능 수행 순서를 나타낸 것이다.2 is a diagram showing the configuration of the function execution means of the system 하는 that performs a photo search through a natural language query and the function execution order of each means according to the present invention.

상기의 각 수단들은 마이크로 프로세서(20)에 저장된 소프트웨어에 의해 실현되는 것들이다. 즉, 마이크로 프로세서(20)는 제2도에 도시된 아이콘 검색부(21), 자연어 질이 입력부(22), 조사분리부(23), 색인어 검색부(24), 빈도수 계산부(25), 사진데이타 출력부(26), 화면출력부(27)를 포함한다. 마이크로 프로세서(20)에 의해 실현되는 상기한 기능 실현 수단들에 대해서 이후에 자세하게 설명한다.Each of the above means are those realized by software stored in the microprocessor 20. That is, the microprocessor 20 may include the icon search unit 21 shown in FIG. 2, the natural language input unit 22, the search separation unit 23, the index word search unit 24, the frequency calculator 25, And a photo data output unit 26 and a screen output unit 27. The above-described function realization means realized by the microprocessor 20 will be described in detail later.

본 사진 검색 시스템의 시작방법은 DOS 프롬프트 상태에서 키보드(10)를 통하여, 예를 들어, sara3를 입력하면 본 발명에 따른 사진 검색 시스템의 로고화면이 나타난다. 이때 아무키나 치면 제3도에 도시된 바와 같이 사무실 환경을 그대로 나타낸 초기화면으로 넘어간다.The start method of the picture retrieval system is a logo screen of the picture retrieval system according to the present invention is displayed by inputting, for example, sara3 through the keyboard 10 in a DOS prompt state. If you press any key, you will be redirected to the initial screen showing the office environment as shown in FIG.

기초화면에 대한 각 객체들에 대해 설명하면 다음과 같다.The following describes each object for the base screen.

·책장 : 마우스(60)로 클릭하면 확대 표시된 책장이 보이고 사용자가 원하는 책을 골라 선택하면 바로 책보는 모드로 바뀐다.Bookshelf: Clicking with the mouse (60) shows an enlarged bookshelf, and the user selects a desired book and immediately changes the bookshelf mode.

·시계 : 확대 표시되면서 현재 시간을 알려준다.· Clock: Displays the current time while zoomed in.

·계산기 : 확대 표시되면서 기본적인 사칙연산기능을 수행한다.· Calculator: Performs basic arithmetic operations while zoomed in.

·메모리 : 확대 표시되면서 마우스(60)로 메모기능을 수행할 수 있으며, 뒷장(Forward Backward)으로 검색가능하다.Memory: The enlarged display allows memo to be performed by the mouse (60), and can be searched by forward backward.

·문 : 화면이 점차 희미해지면서 방안 환경에서의 작업을 끝내고 DOS 상태로 빠진다.· Q: The screen fades out gradually, and I finish my work in the room environment and fall back to DOS state.

·책 : 방안 환경에서 책보는 모드로 바꾸니다.Book: In the room environment, the booklet changes to the mode.

책의 내용은 2페이지에 걸쳐 첫화면은 대분류, 그 다음 화면은 중분류, 또 그 다음의 화면은 소분류를 나타내는 아이콘들이 표시되며 사용자는 원하는 사진을 이들 아니콘들을 클릭하여 검색할 수 있으며 검섹 결과는 한 페이지에 3장의 사진과 사진 제목이 나타난다.The contents of the book are divided into two pages. The first screen is divided into major categories, the next screen is divided into subclasses, and the next screen displays icons representing subclasses. The user can click on these icons to search for the desired photo. Three pictures and their titles appear on a page.

·아가씨 : 사진 검색 시스템에 찾아온 것에 대한 환경 베세지가 애니메이션과 사운드로 설명되며 지정된 다이알로그 박스에 키보드를 이용하여 찾고자 하는 사진 코드를 입력하거나 파리의 개선문에대하여 알고 싶다와 같이 보통 다른 사람에게 물어보듯이 형식에 매이지 않는 일상적인 자연어로 물어볼 수 있음을 안내한다.Lady: The environment messages about what comes to the photo retrieval system are explained with animation and sound, and usually you ask someone to enter the picture code you want to find using the keyboard or to know about the Arc de Triomphe in Paris. As you can see, you can ask questions in natural language that are not tied to form.

이상과 같은 초기 화면에서 사용자는 사용하고자 하는 검색방법을 선택할 수 있다.In the initial screen as described above, the user can select a search method to be used.

즉, 아이콘을 기본으로 하는 분류표를 이용한 검색과 자연어 질의 검색중 하나의 방법을 선택하여야 한다. 이것은 사용자가 아가씨를 선택하면서 지정되는데 만약에 사용자가 아가씨를 선택하면 질의어 입력대화 상자가 화면에 출력된다. 그렇지 않다면 분류표에 의한 아이콘 검색화면이 출력된다.That is, one method of searching using a classification table based on an icon or a natural language query should be selected. This is specified when the user selects a lady. If the user selects a lady, a query input dialog box is displayed on the screen. Otherwise, the icon search screen by the classification table is displayed.

아이콘 검색부(21)는, 제6a도에 나타낸 바와 같이, 사용자가 선택하는 분류(대분류, 중분류, 소분류)별로 검색을 실시한 후 사진데이터의 식별자를 추출한다.As shown in Fig. 6A, the icon retrieval unit 21 searches for the classification (large classification, medium classification, small classification) selected by the user, and then extracts the identifier of the photo data.

이것의 결과물은 직접 사진 데이터 추출부(26)로 넘겨지게 된다. 이 아이콘 검색부(21)가 출력하는 화면 중 하나를 제4도에 예로서 나타내었다.The result of this is passed directly to the photo data extraction section 26. One of the screens output by this icon search unit 21 is shown as an example in FIG.

자연어 질의 입력부(22)는 키보드(10)를 통해 사용자가 입력한 한글 및 영문으로 구성된 질의어를 대화상자에서 메인 메모리(40)로 위치를 옮기는 작업을 전담한다.The natural language query input unit 22 is dedicated to moving the position of the query language composed of Korean and English inputted by the user through the keyboard 10 from the dialog box to the main memory 40.

제5도에는 질의어 입력화면이 도시되어 있고, 제6b도에는 자연어 질의 입력부(22)의 구체적인 기능 수행과정이 도시되어 있다.FIG. 5 illustrates a query input screen, and FIG. 6B illustrates a process of performing a specific function of the natural query input unit 22.

조사 부리부(23)는 자연어 질이 입력부(22)에 의해 메인 메모리(40)로 전달된 질의어를 처리하기 위하여 우선적으로 명사 만을 추출하는데, 이를 위하여 미리 제작된 각종 사전들을 이용하여 조사를 분리하여 면사만을 추출한다. 예를들어 일본의 초가집 풍경에 대하여 보여주세요라고 하는 질의어가 있다면 조사분리부에서는 일본, 초가집, 그리고 풍경이라는 명사만을 추출해낸다.The survey beak 23 first extracts only nouns in order to process the query words transmitted by the input unit 22 to the main memory 40. For this purpose, the survey is separated by using various pre-produced dictionaries. Extract only cotton yarn. For example, if there is a query that shows me about the scenery of thatched houses in Japan, the survey division extracts only the nouns of Japan, thatched houses and scenery.

조사 분리부(23)의 구체적인 기능 실행과정은 제6c도에 도시된 바와 같다.The specific function execution process of the irradiation separation unit 23 is as shown in FIG. 6C.

기존의 정보 검색 기법른 정형화된 구조를 갖는 데이터를 사람이 일일이 키워드(keyword)에 대하여 색인하고, 데이터 베이스에 저장하여 그것을 특정 언어로 검색하는 것이 보통이었다. 그러나 본 시스템에서 사용된 데이터의 형태는 정형화된 데이터 뿐만아니라 비정형화된 데이터가 존재한다.Conventional information retrieval techniques have usually used a person to index data having a structured structure by keyword, store it in a database, and retrieve it in a specific language. However, in the form of data used in the present system, there are not only standardized data but also unstructured data.

본 시스템의 구체적인 이해를 위하여 본 시스템에서 사용된 다음의 표 1과 같은 샘플사진 슬라이드 필름의 자료구조를 살펴본다.For a detailed understanding of the system, look at the data structure of the sample slide film shown in Table 1 below.

표 1은 본 발명의 시스템에서 사용된 슬라이드 한 장의 필름에 대한 관련 자료이다. 물론 이것과 같이 관련된 각각의 슬라이드 사진이 있다.Table 1 shows the relevant data for one slide of film used in the system of the present invention. Of course, there are pictures of each slide associated with it.

위의 표 1에서 볼 수 있듯이 분류번호, 고유번호, 제목, 사진크기, 색상, 촬영자, 그리고 촬영일은 전형적인 정형화된 형태의 데이터 베이스의 구조를 가진다.As shown in Table 1 above, the classification number, identification number, title, photo size, color, photographer, and photographing date have a typical structured database.

이 데이터들이 사진 검색 시스템의 인덱스가 되어 검색 작업시 키워드로 사용된다.These data become indexes of the photo retrieval system and are used as keywords in retrieval.

그리고 사진내용 필드는 비정형 텍스트 데이터로서 일반적인 데이터 베이스에서는 색인어로 사용될 수 없으나, 본 시스템에서는 전문(full-text) 인덱스 기법에 의해 설명중의 단어들이 자동으로 색인되어 검색하는데 사용된다. 그러나 슬라이드 사진자체는 텍스트가 아닌 비정형 데이터로서 색인어로 사용될 수 없고 사용자에게 화면 또는 프린터로 제공하는 데이터로서만 사용된다.The picture content field is unstructured text data and cannot be used as an index word in a general database. However, in the present system, a full-text indexing technique is used to automatically index and search words. However, the slide photograph itself is not text but unstructured data and cannot be used as an index word, but only as data provided to a user on a screen or a printer.

이와 같은 자료구조를 갖는 형태의 데이터들에 대하여 기존의 검색시스템은 위에서 설명된 정형화된 데이터 구조에 대해서만 검색이 가능하다.For the data of the type having such a data structure, the existing search system can search only the structured data structure described above.

예를들어 일반적인 데이터 베이스 검색시스템에서 표 1과 같은 내용의 사진을 검색하기 위한 방법은 다음과 같은 극히 저영화된 언어를 사용하여야 한다(관계형 데이터 베이스에서의 SQL 언어 사용시).For example, in a typical database retrieval system, a method for retrieving a photo as shown in Table 1 should use the following extremely low-language language (when using the SQL language in a relational database).

......

SELECT FROM 슬라이드 필름자료SELECT FROM SLIDE FILM

WHERE 제목=멕시코 음악 and 촬영자=S. W. Frank…WHERE Title = Mexico Music and Photographer = S. W. Frank…

그리고 특히 사용자가 검색하고자 하는 슬라이드 필름에 대하여 극히 알고 있는 사항이 미흡할시에는 사용자가 만족할 만한 검색결과는 기대하기 어렵다.In particular, when the user does not have the most knowledge about the slide film to be searched, it is difficult to expect a satisfactory search result.

그리고 검색된 결과가 나온다고 하여도 분량은 아마도 다시 한 번 사용자가 눈과 손으로 검색하여야 할 분량일 것이다.And even if the search result comes out, the amount is probably the amount that the user should search with his eyes and hands.

하지만 본 시스템에서는 사용자가 특정 슬라이드 필름에 대한 정보없이도 사용자가 원하는 종류의 사진 설명을 자연어를 이용하여 질의를 하면 관련 슬라이드 필름과 그와 관련된 자세한 설명을 컴퓨터 화면에 출력해 준다.However, in this system, if a user inquires using a natural language the type of photo description desired by a user without information on a specific slide film, the related slide film and a detailed description thereof are displayed on a computer screen.

다음은 색이너 검색부(24)와 빈도수 계산부(25)에서의 검색작업에 관하여 설명한다.Next, a search operation in the color inner searcher 24 and the frequency calculator 25 will be described.

조사 분리부(23)에서 일본의 초가집 풍경에 대하여 보여주세요라는 질의어에서 [일본, 초가집, 풍경]을 생성하여 이것을 색인어 검색부(24)에 전달하는데, 이것들을 중심으로 제6d도와 같이 사진 검색시스템에서의 검색이 이루어지게 된다.The search separation section 23 generates [Japan, thatched houses, landscapes] from the query "Show me about thatched landscapes in Japan" and delivers them to the index word search unit 24. The picture retrieval system as shown in FIG. The search at is done.

검색자료로서 일본, 초가집, 풍경과 같은 명사가 생성되었으면 앞서 타 시스템에서 만들어진 키워드 인덱스 파일로 부터 각 명사에 해당하는 항목을 얻게된다. 이때 얻어진 결과가 다음의 표 2와 같다고 가정하자.If nouns such as Japan, thatched houses, and landscapes are generated as the search data, items corresponding to each noun are obtained from the keyword index file created in another system. Assume that the obtained results are as shown in Table 2 below.

일본이라는 색인어 인덱스 항목이 갖는 의미는 그 색인어가 사용된 전체카드의 개수는 4개이며 그것이 11060카드에 한 번, 11063카드에 두 번, 11067카드에 한 번, 11072카드에 한 번 사용되었다는 것을 의미한다.The meaning of the index term index entry in Japan means that the total number of cards in which the index term is used is four and that it is used once for 11060 cards, twice for 11063 cards, once for 11067 cards, and once for 11072 cards. do.

또한 전체적으로는 그것이 5번(1+2=1+1) 사용되었다는 것을 알 수 있다. 여기에서 지칭되는 카드라 함은 각각 한 장으로 슬라이드 필름과 그와 관련된 설명내용을 말한다.It can also be seen that it was used five times (1 + 2 = 1 + 1) overall. Cards referred to herein refer to a slide film and a description thereof associated with each sheet.

사용자의 질문을 입력받았을 때 보여주어야 하는 카드는 사용자의 질문에 나타나는 키워드간의 유사도를 구했을 때 가장 높은 빈도수를 지니는 카드입을 알 수 있다.When the user's question is input, the card to be shown is the card having the highest frequency when the similarity between keywords appearing in the user's question is obtained.

이러한 검색은 색인어 인덱스 검색결과를 이용해 수행되며, 유사도를 구하는 방법에 따라 논리합(OR) 연산과 논리곱(AND) 연산으로 구분된다.Such a search is performed using index index search results, and is divided into OR and AND operations according to a method of obtaining similarity.

이러한 작업이 제6e도에 도시된 바와 같은 과정을 수행하는 빈도수 계산부(25)에서 처리된다. 우선 색인어간 유사도를 구하기 위한 준비작업으로서 색인어 인덱스 검색결과를 렉드(2개 요소를 갖는 레코드 : 1. 카드번호, 2. 그 카드에서의 빈도수)들의 배열로 저장하고 또한 그것을 레코드에 있는 각 카드의 번호요소 크기순으로 정렬하면, 결과적으로 색인어 간의 유사도를 구하기 위한 준비과정을 나타낸 다음의 표 3과 같이 구성된다.This operation is processed in the frequency calculator 25 performing the process as shown in FIG. 6E. First, in preparation for finding the similarity between index words, the index index search results are stored as an array of lags (records with two elements: 1. card number, 2. frequency on that card), and it is also used for each card in the record. Sorting by the number element size, the result is shown in Table 3 below, which shows the preparation process for finding similarity between index words.

우선 논리합 연산작업에 대하여 살펴보자. 만약에 사용자가 질무하는 내용을 가지고 있는 특성 카드가 정말로 존재한다면, 질문에 포함된 색인어들이 그 카드의 내용으로 대부분 포함되어 있을 것이다.Let's first look at the OR operation. If there is really a feature card that contains the user's content, then most of the index words included in the question will be included in the content of the card.

예를 들어 11060버느이 카드가 일본, 초가집 그리고 풍경을 포함한다면, (표 2)에서 나타내어진 것처럼 11060번 카드가 3회(색인어들 각각에 대해 하나씩) 나타날 것이다.For example, if the 11060 card includes Japan, thatched houses and landscapes, the card 11060 will appear three times (one for each index), as shown in Table 2.

그러므로 질문에 포함된 색인어들이 특정 카드에 집중되어 나타난다면 그 카드가 사용자가 원하는 카드인 것으로 판단할 수 있다. 따라서 표 3으로부터 우리는 같은 카드번호가 반복되고 또는 발생빈도가 높게 나타나는 것을 검색 결과물로서 좀 더 우선순위가 높은 것으로 간주할 수 있다.Therefore, if the index words included in the question are concentrated on a particular card, it can be determined that the card is the one desired by the user. Therefore, from Table 3, we can consider that the same card number is repeated or the frequency of occurrence is higher, which is higher priority as a search result.

검색 결과물로서 우선순위가 높은 정도로를 중요도(Weight)라고 하며, 각각의 카드의 중요도를 구하는 알고리즘은 다음의 표 4와 같다.The degree of high priority as a search result is called the weight, and an algorithm for calculating the importance of each card is shown in Table 4 below.

그리고 이러한 알고리즘에 의해 생성되는 각 카드번호와 그것들의 중요도는 레코드(2개의 요소 : 1. 카드번호, 2. 중요도)들의 배열로 저장되며 다시 이것은 중요도에 의해 정렬되어 최종적인 검색 결과물로 만들어진다(표 5).Each card number and their importance generated by this algorithm is stored as an array of records (two elements: 1. card number, 2. importance), which in turn is sorted by importance and made into the final search result (table 5).

그럼 이제 논리합 연산작업과 논리곱 연산작업의 차이점에 대하여 알아보자.Now let's look at the difference between the OR operation and the AND operation.

논리합 연산작업의 결과물인 (표 5)를 살펴보면 질문에 포함된 세 개의 키워드를 모두 담고있는 카드의 변호는 11060과 11072 두 개 뿐이다. 우리는 이들 두 개의 카드가 사용자가 원하는 카드임을 짐작할 수 있다.Looking at the result of the OR operation (Table 5), there are only 11060 and 11072 defending cards containing all three keywords in the question. We can assume that these two cards are what the user wants.

그러나 논리함 연산작업에서는 질의중에 사용된 색인어가 한 번이라도 사용된 카드라면 검색 결과로서 모두 제공하게 하므로 경우에 따라서는 필요없는 연산시간을 초래하고 사용자가 질문에 크게 관련없는 결과까지도 제공하게 된다.However, in the logic operation, if the index word used in the query is used even once, the card is provided as a search result. In some cases, it causes unnecessary computation time and provides the user with results that are not related to the question.

따라서 질의어속에 포함된 색인어들을 정확히 모두 포함하고 있는 카드들을 선별해 낼 수 있는 기능이 필요로 하게 되는데 이것이 논리곱 연산작업이다.Therefore, we need a function that can sort out the cards that contain all the index words included in the query word.

다음의 표 6은 논리곱 연산작업에 필요한 알고리즘을 나타낸다.Table 6 below shows the algorithms required for the logical product operation.

위에서 설명되어진 것과 같이 논리곱 연산작업에서는 경우에 따라서 논리합 연산작업보다 불필요한 연산시간을 절약할 수 있지만 사용자가 필수적으로 논리합 연산작업을 원하는 경우는 종종 있기 때문에 본 시스템에서는 사용자가 요구사항에 맞게 원하는 연산작업을 선택할 수 있도록 구성되어져 있다.As described above, in the logical AND operation, in some cases, unnecessary operation time can be saved than the OR operation, but the user often needs to perform the OR operation. It is configured to select a task.

사진데이타 출력부(26)는 빈도수 계수부(25)의 기능수행 결과물로 카드들을 바탄으로 관련사진 데이터를 사진 데이터 베이스로부터 추출하는 작업을 전담하는데, 이의 구체적인 기능수행과정은 제6f도에 도시되어 있다.The photo data output unit 26 is responsible for extracting the relevant photo data from the photo database using the cards as a result of the function of the frequency counting unit 25. A detailed functional process thereof is shown in FIG. have.

본 사진 검색 시스템에서 사용된 이미지 저장 압축 알고리즘은 시스템의 성능, 즉 실시간 검색의 효과를 증대시키기 위하여 복잡성이 그리높지 않은 것을 선택하여 사용된다.The image storage compression algorithm used in this photo retrieval system is used to select the one that is not very complicated in order to increase the performance of the system, that is, the effect of real-time retrieval.

사진 검색 시스템의 기본적인 요구사항은 빠른 시간내에 많은 양의 사진을 검색할 수 있어야 하는 것으로, 복잡성이 높은 알고리즘을 사용할 경우 거기에서 오는 시스템 성능 저하는 아무리 저장장치의 공간을 절약할 수 있다 하더라도 무시되어져서는 않된다. 고로 본 시스템에서의 화면출력부(27)는 이러한 요구사항을 각별히 고려하여 구성된다.The basic requirement of a photo retrieval system is to be able to retrieve a large amount of photos in a very short time, and with the use of highly complex algorithms, the system degradation that comes from it is ignored, no matter how much space can be saved on the storage device. It should not be lost. Therefore, the screen output unit 27 in the present system is configured in consideration of these requirements.

제6g도에 도시된 흐름과 같이 사진데이타 출력부(26)로부터 추출된 사진데이타를 고해상도 화면(50)에 출력하는 작업을 실행하는 것이 화면출력부(27)이다.As shown in FIG. 6G, the screen output unit 27 executes a job of outputting the photo data extracted from the photo data output unit 26 to the high resolution screen 50.

이 화면출력부(27)에서는 압축되어진 사진데이타를 실시간으로 재생하기 위하여 메인메모리(40)에 최대한의 사용 가능한 양을 할당받아 최소한의 실행시간을 꾀한다.In this screen output unit 27, the maximum usable amount is allocated to the main memory 40 in order to reproduce the compressed picture data in real time, thereby minimizing execution time.

한 고해상의 그래픽 출력장치에 적합한 처리모듈을 작성한다.Create a processing module suitable for a high resolution graphics output device.

Claims

Through the mass storage device 30, the main memory 40, a keyboard 10 for inputting a user's natural language query, and the keyboard 10, in which compressed photographic data are stored and dictionaries are constructed. According to the user's natural language, the index words are extracted from the dictionaries obtained in advance in the mass storage device 30, placed in the main memory 40, the frequency is calculated, the priority is determined, and the priority order is determined. A high resolution screen 50 for performing photo retrieval by sorting index words and reproducing photo data obtained in the mass storage device 30 according to the results retrieved by the microprocessor, and in the system A mouse 60 for carrying out the movement and selection of objects and extraction of photographic data; The microprocessor 20 performs a search for each category selected by the user, extracts an identifier of photo data, and displays a query word composed of Korean and English inputted by the user through the icon search unit 21 and the keyboard 10. A noun using the dictionaries to process a query transmitted to the main memory 40 by the natural language query input unit 22 that moves the position to the main memory 40 in a dialog box. On the basis of the survey separation unit 23 for extracting the bay, the index word search unit 24 and the frequency calculating unit 25 for obtaining the index and the frequency of the index word, respectively, and the cards which are the result of performing the function of the frequency calculating unit 25. The high resolution screen 50 displays the pre-data output unit 26 for extracting related photo data from the photo database and the photo data extracted from the photo data output unit 26. Photo retrieval system comprising a screen output unit 27 for outputting.