KR20110116790A

KR20110116790A - Apparatus and method for providing translation service in portable terminal

Info

Publication number: KR20110116790A
Application number: KR1020100036402A
Authority: KR
Inventors: 함종규; 오준섭; 박영희; 심현식; 김학수; 서정연; 선충녕; 이세희; 정형일
Original assignee: 삼성전자주식회사; 강원대학교산학협력단; 서강대학교산학협력단
Priority date: 2010-04-20
Filing date: 2010-04-20
Publication date: 2011-10-26

Abstract

본 발명은 휴대용 단말기에서 번역 서비스를 제공하기 위한 것으로, 번역 서비스 제공 방법은, 제1언어의 입력 문장이 입력되면, 상기 입력 문장에서 적어도 하나의 개념어를 추출하는 과정과, 구축 데이터에 저장된 후보 문장들 중 적어도 하나의 유사 문장을 선택 및 제시하는 과정과, 상기 적어도 하나의 유사 문장 중 사용자에 의해 선택된 유사 문장에 치환 대상 개념어가 포함된 경우, 상기 치환 대상 개념어의 치환 여부를 문의하는 화면을 표시하는 과정과, 상기 치환 대상 개념어를 상기 입력 문장에 포함된 개념어로 치환하는 과정과, 치환된 개념어를 포함하는 문장을 상기 제2언어로 번역한 변역 문장을 표시하는 과정을 포함한다.The present invention is to provide a translation service in a portable terminal, the translation service providing method, if an input sentence of the first language is input, extracting at least one concept word from the input sentence, and the candidate sentence stored in the construction data Selecting and presenting at least one similar sentence among the at least one similar sentence; and displaying a screen for inquiring whether to substitute the at least one similar sentence if the at least one similar sentence includes a substitution target conceptual word. And a step of replacing the substitute target conceptual word with a conceptual word included in the input sentence, and displaying a translated sentence in which a sentence including the substituted conceptual word is translated into the second language.

Description

Apparatus and method for providing a translation service in a mobile terminal {APPARATUS AND METHOD FOR PROVIDING TRANSLATION SERVICE IN PORTABLE TERMINAL}

본 발명은 휴대용 단말기에 관한 것으로, 특히, 휴대용 단말기에서 번역 서비스를 제공하기 위한 장치 및 방법에 관한 것이다.The present invention relates to a portable terminal, and more particularly, to an apparatus and method for providing a translation service in a portable terminal.

해외 여행 시와 같이 외국어가 필요한 상황에 있어서 외국어 회화 보조에 대한 요구가 증대되고 로밍(roaming) 기능의 지원으로 외국 여행에 휴대용 전화기가 필수화됨에 따라, 휴대용 단말기를 통한 간편하고 빠른 회화 문장 검색/생성/발성이 가능한 회화 문장 번역 서비스의 개발이 진행 중에 있다. In the situation where foreign language is needed, such as when traveling abroad, the demand for foreign language conversation assistance is increased and roaming function makes mobile phone mandatory for foreign travel. A conversation sentence translation service is being developed.

외국어 번역에 있어서 종래의 접근 방식은 모국어의 자연어 분석 및 외국어의 자연어 생성을 기초로 한다. 번역에 있어서 실용적 측면을 강조하는 입장은 번역을 일종의 검색의 문제로서 접근하기도 한다. 검색의 문제로 해석하는 입장은 모국어 문장과 외국어 문장을 쌍으로 묶어서 문서 집합을 구성한 후, 사용자의 입력에 대해 가장 유사한 모국어 문장을 검색하고, 검색된 문장 집합에서 사용자가 선택한 문장에 대응되는 외국어 문장을 제공하는 방식을 사용한다. 상술한 방식은 항상 올바른 외국어 문장이 제공되므로, 외국어에 대해 무지한 사람이라도 상황에 적절한 외국어를 구사할 수 있다는 장점이 있다. 유사한 문장을 검색하는데 있어서, 각 문장에 대해 유사도를 측정하는 방법 및 검색에서 사용하는 역 파일과 색인어를 추출하는 방법이 있다. Conventional approaches to foreign language translation are based on natural language analysis of the native language and natural language generation of the foreign language. The position of emphasizing the practical aspects of translation also approaches translation as a kind of search problem. The position interpreted as a problem of search is composed of a pair of mother language sentences and foreign language sentences to form a document set, and then searches for the most similar native language sentences for the user's input, and searches for foreign language sentences corresponding to the sentences selected by the user in the found sentence sets. Use the method provided. Since the above-described method always provides the correct foreign language sentence, there is an advantage that even a person who is ignorant of the foreign language can speak a foreign language appropriate to the situation. In searching for similar sentences, there are a method of measuring similarity for each sentence and a method of extracting an inverse file and an index word used in the search.

휴대용 전화기 및 PDA(Personal Data Assistant)와 같이 시스템의 자원이 한정된 휴대용 단말기의 경우, 용량의 한계와 키워드(Keyword) 검색의 한계로 인하여 사용자가 사용하고자 하는 문구를 직접 입력하기 보다, 제시되는 여러 단계를 통해 문장을 선택하거나, 문구를 직접 입력한 경우 키워드 검색 방식을 적용함으로 불완전한 서비스를 제공한다. 더불어 발성 서비스의 경우에는 문장 발성이 미리 저장된 음성 파일로 구성되어 직접 입력한 경우 발성 서비스 지원에 한계를 가진다.In the case of mobile terminals with limited system resources such as mobile phones and personal data assistants (PDAs), due to capacity limitations and keyword search limitations, several steps are suggested rather than directly entering the phrase the user wishes to use. If you select a sentence or enter a phrase directly through the keyword search method is applied to provide an incomplete service. In addition, the speech service has a limitation in supporting the speech service in the case of directly inputting the speech file that is composed of the pre-stored voice file.

종래의 기술 중 언어 분석과 생성을 통해 번역하는 기술은 범용적인 접근은 가능하지만 개개의 상황에 대응하지 못하는 경우가 많으며, 실제 만들어진 외국어 문장이 오류를 포함하는 경우가 많기 때문에 실용적으로 사용하기 어렵다. 따라서 실용적인 측면에서는 충분한 번역 쌍을 수집하고 사용자 정보를 갱신할 수 있으며, 그 정보에서 비슷한 문장을 검색하는 유사도에 기초한 번역 접근법이 효율적일 수 있다. 직접적인 문장 사이의 유사도를 결정하는 방법들은 등록된 문장이 늘어나면 늘어날수록 결정 량이 비례하여 증가하기 때문에 충분한 발화 쌍의 수집이 성능을 좌우하는 방법에서 효과적이지 못하다. 일반적인 검색에서 사용하는 방법처럼, 역 파일과 색인어를 구성하는 방법이 후보 문장 쌍의 양에 비교적 의존적이지 않게 성능을 낼 수 있기 때문에 효과적이다. 하지만 이와 같은 검색을 사용하는 방법에 있어서는 실제로 색인 항목으로 구성되는 단어들이 출현되었는가에 의존하는 방법이기 때문에 동일한 의미를 가지더라도 전혀 다른 색인어의 구성을 가질 수 있는 문장의 특성으로 인하여 성능이 저하될 수 있다. Among the conventional techniques, the technique of translating through language analysis and generation is universal, but it is often difficult to cope with individual situations, and it is difficult to use practically because foreign sentences that are actually made include many errors. Therefore, in practical terms, it is possible to collect sufficient translation pairs and update user information, and a similarity-based translation approach of searching for similar sentences in the information can be efficient. The methods for determining the similarity between direct sentences are not effective in the way that the collection of sufficient pairs of speech influences performance because the amount of decision increases proportionally as the number of registered sentences increases. As with the regular search, the construction of the reverse file and index terms is effective because it can perform relatively independently on the amount of candidate sentence pairs. However, the method of using such a search depends on whether or not the words consisting of the index items actually appear. However, the performance may be degraded due to the characteristics of the sentence, which may have completely different index terms even though they have the same meaning. have.

따라서, 본 발명의 목적은 휴대용 단말기에서 외국어 번역 서비스를 제공하기 위한 장치 및 방법을 제공함에 있다.Accordingly, an object of the present invention is to provide an apparatus and method for providing a foreign language translation service in a portable terminal.

본 발명의 다른 목적은 휴대용 단말기에서 특정 단어를 치환함으로써 사용자가 번역하고자하는 모국어 문장을 구성하기 위한 장치 및 방법을 제공함에 있다.Another object of the present invention is to provide an apparatus and method for constructing a native language sentence to be translated by a user by substituting a specific word in a portable terminal.

본 발명의 또 다른 목적은 휴대용 단말기에서 입력 문장에서 개념어를 추출하기 위한 장치 및 방법을 제공함에 있다.Another object of the present invention is to provide an apparatus and method for extracting a conceptual word from an input sentence in a portable terminal.

상기 목적을 달성하기 위한 본 발명의 제1견지에 따르면, 휴대용 단말기에서 제1언어를 제2언어로 번역하는 번역 서비스 제공 방법은, 상기 제1언어의 입력 문장이 입력되면, 상기 입력 문장에서 적어도 하나의 개념어를 추출하는 과정과, 구축 데이터에 저장된 후보 문장들 중 적어도 하나의 유사 문장을 선택 및 제시하는 과정과, 상기 적어도 하나의 유사 문장 중 사용자에 의해 선택된 유사 문장에 치환 대상 개념어가 포함된 경우, 상기 치환 대상 개념어의 치환 여부를 문의하는 화면을 표시하는 과정과, 상기 치환 대상 개념어를 상기 입력 문장에 포함된 개념어로 치환하는 과정과, 치환된 개념어를 포함하는 문장을 상기 제2언어로 번역한 변역 문장을 표시하는 과정을 포함하며, 상기 치환 대상 개념어는, 상기 선택된 유사 문장에 포함되고 상기 입력 문장에 포함된 개념어와 동일한 개념에 속하되 상이한 대상을 나타내는 개념어임을 특징으로 한다.According to a first aspect of the present invention for achieving the above object, a translation service providing method for translating a first language into a second language in a portable terminal, when an input sentence of the first language is input, at least in the input sentence Extracting one conceptual word, selecting and presenting at least one similar sentence among candidate sentences stored in the construction data, and including a target word for substitution in the similar sentence selected by the user among the at least one similar sentence In this case, a process of displaying a screen for inquiring whether to substitute the target conceptual word, replacing the conceptual target word with a conceptual word included in the input sentence, and a sentence including the substituted conceptual word in the second language. And displaying the translated translation sentence, wherein the conceptual object to be substituted is included in the selected similar sentence and It is a conceptual word belonging to the same concept as the conceptual word included in the power sentence but representing a different object.

상기 목적을 달성하기 위한 본 발명의 제2견지에 따르면, 제1언어를 제2언어로 번역하는 번역 서비스를 제공하는 휴대용 단말기 장치는, 상기 제1언어의 입력 문장이 입력되면, 상기 입력 문장에서 적어도 하나의 개념어를 추출하고, 구축 데이터에 저장된 후보 문장들 중 적어도 하나의 유사 문장을 선택 및 제시하는 제어부와, 상기 적어도 하나의 유사 문장 중 사용자에 의해 선택된 유사 문장에 치환 대상 개념어가 포함된 경우 상기 치환 대상 개념어의 치환 여부를 문의하는 화면을 표시하는 표시부와, 상기 제어부는, 상기 치환 대상 개념어를 상기 입력 문장에 포함된 개념어로 치환하고, 상기 표시부는, 치환된 개념어를 포함하는 문장을 상기 제2언어로 번역한 변역 문장을 표시하며, 상기 치환 대상 개념어는, 상기 선택된 유사 문장에 포함되고 상기 입력 문장에 포함된 개념어와 동일한 개념에 속하되 상이한 대상을 나타내는 개념어임을 특징으로 한다.According to a second aspect of the present invention for achieving the above object, a portable terminal device for providing a translation service for translating a first language into a second language, when the input sentence of the first language is input, A control unit for extracting at least one conceptual word, selecting and presenting at least one similar sentence among candidate sentences stored in the construction data, and a substitution target conceptual word included in the similar sentence selected by the user among the at least one similar sentence A display unit for displaying a screen for inquiring whether to substitute the conceptual word for substitution, and the control unit replaces the replacement target conceptual word with a conceptual word included in the input sentence, and the display unit replaces a sentence including the substituted conceptual word. A translation sentence translated into a second language is displayed, and the conceptual object to be substituted is included in the selected similar sentence. Characterized in the same concept as the conceptual word included in the input sentence, but represents a different object.

휴대용 단말기에서 의미에 따라 외국어 문장 및 모국어 문장을 매핑하고, 개념어를 통한 문장의 수정 기능을 제공함으로써, 효과적인 번역 서비스를 제공할 수 있다.In the portable terminal, an effective translation service may be provided by mapping a foreign language sentence and a native language sentence according to a meaning and providing a function of correcting a sentence through a conceptual word.

도 1은 본 발명의 실시 예에 따른 번역 시스템의 개념적 구조를 도시하는 도면,
도 2는 본 발명의 실시 예에 따른 문장 유형 판단 동작의 예를 도시하는 도면,
도 3은 본 발명의 실시 예에 따른 개념어 추출 과정을 도시하는 도면,
도 4는 본 발명의 실시 예에 따른 클러스터링을 도시하는 도면,
도 5는 본 발명의 실시 예에 따른 개념어 대역 사전을 이용한 개념어 치환을 도시하는 도면,
도 6은 본 발명의 실시 예에 따른 휴대용 단말기의 블록 구성을 도시하는 도면,
도 7은 본 발명의 실시 예에 따른 휴대용 단말기의 동작 절차를 도시하는 도면.1 is a diagram illustrating a conceptual structure of a translation system according to an embodiment of the present invention;
2 is a diagram illustrating an example of a sentence type determination operation according to an embodiment of the present invention;
3 is a diagram illustrating a conceptual word extraction process according to an embodiment of the present invention;
4 is a diagram illustrating clustering according to an embodiment of the present invention;
5 is a diagram illustrating conceptual word substitution using a conceptual word band dictionary according to an embodiment of the present invention;
6 is a block diagram of a portable terminal according to an embodiment of the present invention;
7 is a diagram illustrating an operation procedure of a portable terminal according to an embodiment of the present invention.

이하 본 발명의 바람직한 실시 예를 첨부된 도면의 참조와 함께 상세히 설명한다. 그리고, 본 발명을 설명함에 있어서, 관련된 공지기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단된 경우, 그 상세한 설명은 생략한다.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.

이하 본 발명은 휴대용 단말기에서 외국어 번역 서비스를 제공하기 위한 기술에 대하여 설명한다. 이하 설명에서 상기 휴대용 단말기는 셀룰러 전화기(Celluar Phone), 개인 휴대 통신 전화기(PCS : Personal Communication System), PDA(Personal Digital Assistant), IMT2000(International Mobile Telecommunication-2000) 단말기, 랩탑(lap-top) 컴퓨터, 데스크탑(desk-top) 컴퓨터 등을 포함하는 의미로 사용된다.
Hereinafter, a technology for providing a foreign language translation service in a portable terminal will be described. In the following description, the portable terminal is a cellular phone, a personal communication system (PCS), a personal digital assistant (PDA), an international mobile telecommunication-2000 (IMT2000) terminal, a laptop (lap-top) computer. , Desktop (desk-top) computer, etc.

도 1은 본 발명의 실시 예에 따른 번역 시스템의 개념적 구조를 도시하고 있다.1 illustrates a conceptual structure of a translation system according to an embodiment of the present invention.

상기 도 1에 도시된 바와 같이, 번역 시스템은 데이터 관리 모듈(110), 문장 분석 모듈(120), 번역 어플리케이션(130)을 포함하여 구성된다.As shown in FIG. 1, the translation system includes a data management module 110, a sentence analysis module 120, and a translation application 130.

상기 데이터 관리 모듈(110)은 번역에 필요한 데이터를 구축 및 저장하며, 상기 데이터 관리 모듈(100)은 데이터 구축 및 확장 동작을 위한 구성 및 구축 데이터 저장 동작을 위한 구성을 포함한다. The data management module 110 constructs and stores data necessary for translation, and the data management module 100 includes a configuration for data construction and expansion operation and a configuration for construction data storage operation.

상기 데이터 구축 및 확장은 실제 검색의 결과로 제공되는 문장 쌍을 구성하는 과정이다. 상기 데이터 구축 및 확장은 상황에 적절한 카테고리 선정 및 각 카테고리에서 문의 가능한 정보들을 이용하여 대상 문장을 결정하는 단계(111), 문장의 의미에 따라 문장 쌍(pair)을 확장하는 단계(113), 문장에 포함된 개념어 정보 및 문장 유형 정보를 부착하는 부가 정보 부착 단계(115)를 통해 수행된다. 각 단계를 상세히 살펴보면 다음과 같다.The data construction and expansion is a process of constructing a sentence pair provided as a result of the actual search. The data construction and expansion may include selecting a category suitable for a situation and determining a target sentence using information that can be inquired in each category (111), expanding a sentence pair according to the meaning of the sentence (113), and a sentence. The additional information attaching step (115) attaching the conceptual information and sentence type information included in the information is performed. Looking at each step in detail as follows.

상기 카테고리 정리 및 대상 문장 결정 단계(111)는 다음과 같다. 번역의 한 방식 중 문장 쌍을 사용하는 기법이 있다. 상기 문장 쌍을 사용하는 기법은 특정 모국어 문장에 대응되는 외국어 문장을 미리 정의하고, 상기 특정 모국어 문장의 번역 요청 시 상기 대응되는 외국어 문장을 제공하는 것이다. 상기 문장 쌍을 사용하는 기법은 문장이 제한되는 환경에서 더욱 효과적이다. 예를 들어, 여행용 회화를 위한 번역의 경우, 다양한 여행 책자들에서 볼 수 있듯이, 각 상황별로 자주 사용되는 문장들이 일정한 집합들로 구성될 수 있다. 하기 <표 1>은 수집된 카테고리들과 각 카테고리에 속하는 문장들의 예를 나타낸다.The category cleanup and target sentence determination step 111 are as follows. One method of translation is the use of sentence pairs. The technique of using the sentence pair is to predefine a foreign language sentence corresponding to a specific native language sentence, and to provide the corresponding foreign language sentence when a translation request of the specific native language sentence is requested. The technique of using the sentence pair is more effective in an environment in which sentences are restricted. For example, in the case of translation for travel conversation, as can be seen in various travel booklets, sentences frequently used in each situation may be composed of a certain set. Table 1 below shows examples of collected categories and sentences belonging to each category.

카테고리
category
문장개수
Sentence count
카테고리
category
문장개수
Sentence count
카테고리

category

문장개수
Sentence count PC방PC room 3030 백화점Department store 77 예약확인/취소/변경Confirm / Cancel / Change Reservation 2525 객식Dining 3737 버스Bus 123123 우체국post office 5050 경유/환승Pass / Transfer 2727 병원/약국Hospital / Pharmacy 113113 유스호스텔Youth hostel 1515 공항여행안내소Airport Tourist Information Center 2828 보딩패스Boarding Pass 4444 은행Bank 5858 관광tourism 209209 분실/도난Lost / stolen 3838 입국심사Immigration Examination 3737 관람Preview 9797 사고accident 6464 전화telephone 132132 귀국절차Return procedure 5353 사교/오락Social / Entertainment 130130 주유소gas station 2626 기내서비스In-flight service 9595 선박Ship 2929 지하철/철도Subway / Railway 108108 기내식주문Meal order 3535 세관심사Customs inspection 2626 체크아웃check out 3737 기내좌성Cabin seating 3838 쇼핑센터찾기Find Shopping Center 99 택시taxi 9797 기념품점Souvenir shop 77 수화물찾기Find Baggage 2929 패스트푸드fastfood 1616 길찾기Get Directions 4040 슈퍼마켓supermarket 44 프론트front 5252 렌터카Car rental 6161 식당예약/이용Restaurant reservation / use 9393 항공권예약flight booking 3838 룸서비스Room service 4343 식사 및 결정Meals and Decisions 6969 호텔식당Hotel Restaurant 1212 면세점Duty Free Shop 44 식사주문Meal order 8989
총합
total
2697
2697 물건고르기/결정Pick / Choose 121121 신고서작성Report Form 1616 물건교환/반품/환불Article exchange / return / refund 2828 예약/체크인Reservation / Check-in 158158

상기 문장 쌍 확장 단계(120)는 다음과 같다. 본 발명의 실시 예에 따른 번역 시스템에서, 검색의 대상은 문장이다. 문장 검색에 있어서, 키워드(keyword)가 일치하지 않는 유사한 의미의 문장들이 존재하기 때문에, 문장 검색은 일반적인 검색 방법으로 해결되기 어렵다. 이를 위해, 본 발명은 다양한 문장 소스(source)들로부터 동일한 의미의 문장들을 최대한 수집한다. 이때, 본 발명은 수집된 문장들을 동일한 의미끼리 묶고, 충분한 개수의 문장들이 수집되지 않는 경우에는 동일한 의미의 다른 표현을 가지는 문장들이 추가함으로써 문장 데이터를 구축한다.The sentence pair expansion step 120 is as follows. In the translation system according to an embodiment of the present invention, a search target is a sentence. In sentence retrieval, sentence retrieval is difficult to solve by a general retrieval method because there are sentences of similar meanings in which keywords do not match. To this end, the present invention collects sentences of the same meaning as much as possible from various sentence sources. In this case, the present invention binds the collected sentences with the same meaning, and constructs sentence data by adding sentences having different expressions having the same meaning when a sufficient number of sentences are not collected.

상기 부가 정보 부착 단계(115)는 다음과 같다. 일반적으로 문장은 짧고 다양하게 표현될 수 있다. 이로 인해, 후보 문장과 비슷한 형태로 사용자들이 발화(speaking)한다는 것을 보장할 수 없으므로, 하나의 문장으로 대역 데이터를 구축하는 것은 후보 문장 데이터로의 접근을 곤란하게 할 수 있다. 따라서, 본 발명은 동일한 의미를 가지는 모국어 문장들을 활용한다. 이를 위해, 수집된 각 문장을 동일한 의미의 문장들로 확장하는 과정이 요구된다. 동일한 의미를 갖는 문장들을 찾고 관리하는 것은 어려운 문제이므로, 외국어 번역이 동일한 문장들은 의미적으로 동일한 문장으로 취급됨이 바람직하다. 예를 들어, 하나의 외국어 문장에 최소한 3개 이상의 동일한 의미를 갖는 모국어 문장들이 대응됨이 바람직하다. 또한, 문장 비교를 통해 문장을 검색하는 과정에서 발생 가능한 상황으로서, 의미가 유사하지만 서로 다른 목적을 가지는 문장들이 존재하는 경우가 있다. 예를 들어, 특정 국가로 향하는 표(ticket)를 요구하는 문장은 국가명을 제외하면 동일하다 할 수 있다. 여러 자원으로부터 수집된 정보들은 이러한 상황에 따라 변화될 수 있는 항목에 대한 고려가 없기 때문에, 본 발명은 상기 국가명과 같이 변화하는 항목들을 정의한다. 예를 들어, 상기 변화하는 항목들의 종류는 하기 <표 2>와 같다.The additional information attaching step 115 is as follows. In general, sentences can be expressed in short and various ways. Because of this, it is not possible to guarantee that users speak in a form similar to the candidate sentence, so that constructing band data in one sentence may make access to the candidate sentence data difficult. Thus, the present invention utilizes native language sentences with the same meaning. To this end, a process of extending each collected sentence to sentences having the same meaning is required. Since finding and managing sentences having the same meaning is a difficult problem, it is preferable that foreign sentences have the same sentences as semantically identical sentences. For example, it is preferable that at least three native language sentences having the same meaning correspond to one foreign language sentence. In addition, as a situation that may occur in the process of searching for a sentence by sentence comparison, there may be cases in which sentences having similar meanings but having different purposes are present. For example, a sentence requesting a ticket to a particular country may be identical except for the country name. Since the information collected from the various resources has no consideration of the items that can be changed according to this situation, the present invention defines items that change as the country name. For example, the types of the changing items are shown in Table 2 below.

대분류

Main Category

소분류
Subclass
설명
Explanation
태그
tag PersonPerson PersonPerson 홍길동, '아저씨'와 같은 호칭도 포함Also includes titles such as Hong Gil-dong and Uncle PERPER

location

location

LocationLocation 지명 (국가, 도시 등)Place name (country, city, etc.) LOCLOC RestaurantRestaurant 맥도널드, VIIPS, '음식점'McDonald's, VIIPS, 'Restaurant' RSTRST HotelHotel 힐튼호텔, 서울호텔Hilton hotel, Seoul hotel HTLHTL AirportAirport 인천공항, 김포공항Incheon International Airport, Gimpo International Airport AIRAIR StationStation 서울역, 신촌역Seoul Station, Sinchon Station STASTA StoreStore 현대백화점, 세븐일레븐Hyundai Department Store, Seven Eleven STOSTO PositionPosition 위에 정의되지 아니한 종류의 건물, 사무실 및 장소Buildings, offices and places of a kind not defined above POSPOS
Object

Object
ProduceProduce 에어컨, 가방Air conditioner, bag PRDPRD BrandBrand 리바이스, 루이비통Levis, Louis Vuitton BRDBRD FoodFood 음식 이름Food name FODFOD BevarageBevarage 음료 이름Drink name BEVBEV

Num

Num

MoneyMoney 10달러, 1000원$ 10, $ 1000 MNYMNY TimeTime 10시, 오후 2시10 pm, 2 pm TIMTIM DayDay 7월1일, 다음달 5일July 1, next month 5 DAYDAY DurationDuration 기간, 1시간Duration, 1 hour DURDUR TelephoneTelephone 전화번호Phone number TELTEL NumberNumber 위에 정의되지 아니한 종류의 숫자Numbers of a kind not defined above NUMNUM ToolTool TransportationTransportation 지하철, 버스 등의 교통수단Transportation such as subway, bus TRPTRP

문장 비교에 대한 전통적인 접근 방법들은 체언, 용언 등과 같은 핵심어들을 이용하여 벡터를 구성하고, 상기 벡터를 비교한다. 하지만, 상기 전통적인 접근 방법들은 실제 문장 유형을 찾는데 문법적인 요소들을 고려할 수 없는 한계를 가진다. 더욱이, 모든 형태소나 다양한 표층적인 정보를 고려하는 것은 많은 연산량을 요구한다. 따라서, 본 발명은 문장 유형을 정의하고, 상기 문장 유형을 이용하여 문장 비교를 수행한다. 예를 들어, 정의되는 문장 유형은 하기 <표 3>과 같다.Traditional approaches to sentence comparison use keywords such as verbs, verbs, etc. to construct a vector and compare the vectors. However, the traditional approaches have limitations in that they cannot consider grammatical elements in finding the actual sentence type. Moreover, taking into account all morphemes and various surface information requires a lot of computation. Accordingly, the present invention defines a sentence type and performs sentence comparison using the sentence type. For example, the sentence type to be defined is shown in Table 3 below.

대분류

Main Category

소분류
Subclass
설명
Explanation
예문
example DeclarativeDeclarative InformInform 알림, 정보제공Notification, Information 저는 채식주의예요.I am vegetarian. 기름진 음식은 안 좋아해요.I don't like oily food. 한 사람 더 올 거에요.One more person will come. ImperativeImperative RequestRequest 요구, 요청Request, request 커피 리필해주세요.Please refill your coffee. 깎아 주세요.cut please. 남은 음식 좀 싸 주실래요?Would you please wrap the leftovers? PropositivePropositive OfferOffer 제안suggestion 점심 먹으러 갈까요?Should we go for lunch? 쇼핑하러 가자.Let's go shopping. 저녁으로 갈비 어때요?How about ribs for dinner? ExclamatoryExclamatory AdmireAdmire 감탄Admiration InterrogativeInterrogative Ask_whatAsk_what 질문-대상Question-target 오늘의 특별 요리는 뭔가요?What's the special dish of the day? 재질이 뭐지요?What's the material? Ask_whenAsk_when 질문-시간Question-time 지금 주문하면 언제 오나요?When do you come when you order now? 얼마나 기다려야 하나요?How long should I wait? 몇 시에 문을 닫나요?What time do you close? Ask_whereAsk_where 질문-장소Question-place 화장실은 어디 있나요?Where is the bathroom? 백화점은 어디에 있나요?Where is the department store? Ask_whoAsk_who 질문-사람Question-person 누구에게 물어봐야 하나요?Who should I ask? Ask_whichAsk_which 질문-선택Question-choice 어떤 교통편을 이용하실 겁니까?What transport will you use? Ask_whyAsk_why 질문-이유Question-why 음식이 왜 안나오죠?Why is food not coming? Ask_howAsk_how 질문-방법, 수량, 상태Question-how, quantity, status 맛이 어떠니?How does it taste? 몇 분이십니까?How many are you? 얼마예요?How much? Ask_ynAsk_yn 질문-Y/NQuestion-Y / N 배달이 되나요?Will it be delivered? 지금 세일 기간인가요?Is it now on sale? OtherOther OtherOther 기타Etc 건배!/원샷!Cheers! / One shot!

상술한 데이터 구축 및 확장 동작은 오프라인 상에서 시스템 설계자에 의해 수행되거나, 또는, 입력되는 문장에 대해 시스템에 의해 수행될 수 있다. 또한, 번역 서비스 제공 중 번역의 대상으로서 입력되는 문장도 상기 데이터 구축 및 확장 동작의 대상이 될 수 있다.
The above-described data building and expanding operation may be performed by the system designer offline or by the system for an input sentence. In addition, a sentence input as a translation target while providing a translation service may be a target of the data construction and expansion operation.

다음으로, 상기 구축 데이터 저장 동작을 통해, 상기 데이터 구축 및 확장을 통해 후보 문장 데이터는 검색의 형식에 맞도록 변환된 후, 문장 저장 시스템(117)에 의해 검색 정보로서 저장된다. 일반적으로, 문장의 유사도를 판단하는데 개별 문장 비교 방법의 속도 문제를 해결하기 위해 검색 기법을 통해 접근하는 방식이 사용된다. 상기 문장 저장 시스템(117)은 추가된 정보를 충분히 반영할 수 있도록 색인 구조를 구성한다. 예를 들어, 상기 색인 구조에 저장된 내용은 하기 <표 4> 및 하기 <표 5>과 같다. 하기 <표 4>는 문서 정보 파일의 구조를, 하기 <표 5>는 포스팅 파일의 구조를 나타낸다.Next, through the constructing data storing operation, candidate sentence data is converted into a form of a search through the constructing and expanding the data, and then stored as the search information by the sentence storing system 117. In general, in order to determine the similarity of sentences, an approach through a search method is used to solve the speed problem of individual sentence comparison methods. The sentence storage system 117 constructs an index structure to sufficiently reflect the added information. For example, the contents stored in the index structure are as shown in Table 4 and Table 5 below. Table 4 below shows the structure of the document information file, and Table 5 below shows the structure of the posting file.

내용
Contents
key(숫자)
key (number)
Data

Data
문장 정보Sentence information sidsid stf, stype, GEN_num(sid1; sid2; ...) 모국어 문장; 영어문장stf, stype, GEN_num (sid1; sid2; ...) Native language sentences; English sentence 문장에서 나타난 CS 정보CS information in sentences 마지막sid+sidLast sid + sid (cs_type, start_index, cs_len)+(cs_type, start_index, cs_len) +

상기 <표 4>에서, 상기 'sid'는 문장 식별자, 상기 'stf'는 해당 문장 내에 출현한 색인어의 수, 상기 'stype'은 문장 유형, 상기 'GEN_num'은 해당 문장에서 개념어를 제외한 색인어의 개수, 상기 'cs_type'은 개념어 유형, 상기 'start_index'는 문장내 출현 위치, 상기 'cs_len'은 문장내 출현한 cs의 문장 길이를 의미한다.In Table 4, 'sid' is a sentence identifier, 'stf' is the number of index words appearing in the sentence, 'stype' is the sentence type, and 'GEN_num' is the index word except the conceptual word in the sentence. The number, 'cs_type' is the conceptual word type, 'start_index' is the occurrence position in the sentence, 'cs_len' means the sentence length of cs appeared in the sentence.

내용
Contents
key(문자열)
key (string)
Data

Data
전체 term의 빈도Frequency of the entire term #term#term 빈도frequency 전체 cs의 빈도Frequency of overall cs @cs@cs 빈도frequency 전체 st의 빈도Frequency of full st $st$ st 빈도frequency term에 대한 역파일reverse file for term TermTerm (sf,cf,ttf)(sid,cid,tf)+(sf, cf, ttf) (sid, cid, tf) + cs에 대한 역파일reverse file for cs <cs><cs> (sf,cf,ttf)(sid,cid,tf)+(sf, cf, ttf) (sid, cid, tf) + st에 대한 역파일reverse file for st <st><st> (sf,cf,ttf)(sid,cid,tf)+(sf, cf, ttf) (sid, cid, tf) + 전체 term의 수The total number of terms TOTAL_TF_KEYTOTAL_TF_KEY 빈도frequency 해당 클러스터 내 해당 term의 빈도Frequency of that term within that cluster cid+termcid + term 빈도frequency 해당 클러스터 내 해당 cs의 빈도Frequency of that cs in that cluster cid+cscid + cs 빈도frequency 해당 클러스터 GEN의 빈도Frequency of that cluster GEN cid+GENcid + GEN 빈도frequency 클러스터 내 총 term의 수The total number of terms in the cluster "T"+cid"T" + cid 빈도frequency

상기 <표 5>에서, 상기 'term'은 키워드 목록, 상기 'cs'는 개념어, 상기 'st'는 해당 키를 포함하는 문장 타입, 상기 'cf'는 해당 키를 포함하는 클러스터 개수, 상기 'ttf'는 해당 키의 총 빈도, 상기 'sid'는 문장 식별자, 상기 'cid'는 해당 키를 포함하는 클러스터 식별자, 상기 'tf'는 해당 문장 내에서 해당 키의 빈도, 상기 'stf'는 해당 문장 내에 출현한 색인어의 개수를 의미한다.
In Table 5, 'term' is a keyword list, 'cs' is a conceptual word,' st 'is a sentence type including a corresponding key,' cf 'is the number of clusters including a corresponding key, and' ttf 'is the total frequency of the corresponding key,' sid 'is the sentence identifier,' cid 'is the cluster identifier including the key,' tf 'is the frequency of the key in the sentence, and' stf 'is the corresponding frequency. The number of index words that appear in a sentence.

상기 문장 분석 모듈(120)은 검색을 위해 구축된 데이터에서 필요한 정보를 추출하여 키워드를 구성한다. 상기 도 1에 도시된 바와 같이, 상기 문장 분석 모듈(120)의 동작은 일반적인 자연어 처리에서 사용되는 형태소 분석 단계(121), 상기 데이터 구축 및 확장 단계를 통해 설명한 정보를 자동으로 추출하는 문장 유형 판단 단계(123), 개념어 추출 단계(125)를 포함한다.The sentence analysis module 120 extracts the necessary information from the data constructed for the search to construct a keyword. As shown in FIG. 1, the sentence analysis module 120 determines whether the sentence type automatically extracts the information described through the morphological analysis step 121 used in general natural language processing and the data construction and expansion step. A step 123 and a conceptual word extraction step 125 are included.

상기 형태소 분석 단계(121)는 다음과 같다. 자연어 문장을 핵심적인 키워드의 목록으로 만들기 위해서는 형태소 분석 단계가 필수적이다. 본 발명은 형태소 분석을 통해 문장에서 중요한 정보를 포함하는 체언 및 용언을 중심으로 키워드를 구성한다. 문장 번역에 있어서, 상기 체언, 상기 용언 등과 같은 핵심어뿐만 아니라, 문법적인 요소로서 작용하는 부수적 형태소들도 중요하게 작용한다. 하지만, 상기 부수적 형태소들까지 모두 고려하면, 처리해야할 데이터의 용량 증가로 인해 연산 시간이 증가한다. 이와 같은 문제점을 해소하기 위해, 본 발명은 핵심어를 제외한 부수적 형태소들을 문장 유형 정보로서 구성한다. 이를 통해, 일반적인 검색과 같이 적은 개수의 키워드들만을 유지하므로, 저장 자원 및 연산 시간이 단축된다.The morphological analysis step 121 is as follows. A morphological analysis step is essential to make natural sentence lists a key keyword. The present invention constructs a keyword centering on a statement and a verb including important information in a sentence through morphological analysis. In sentence translation, not only key words such as the suffix, the verb, etc., but also the additional morphemes that act as grammatical elements play an important role. However, taking into account all the above morphemes, the computation time increases due to the increase in the capacity of data to be processed. In order to solve this problem, the present invention configures additional morphemes excluding key words as sentence type information. This keeps only a small number of keywords as in a general search, thereby reducing storage resources and computation time.

상기 문장 유형 판단 단계(123)는 다음과 같다. 문장 번역 시, 내용을 표현하는 단어들 외에도 다양한 문법 및 상황을 표현하는 단어들도 번역되어야 한다. 하지만, 저장 자원의 제약으로 인해 모든 정보가 사용되기는 힘들다. 이러한 문제를 극복하기 위해, 본 발명은 입력된 내용의 문법적인 요소를 결정하는 소정 개수의 문장 유형들을 정의한다. 예를 들어, 상기 문장 유형들은 상기 <표 3>과 같다. 정의된 문장 유형들은 처리 속도 및 저장 공간을 고려하여 문법적으로 분류되어 입력 문장에 할당된다. 이때, 구축한 문법의 일반적인 사용을 위해 어휘 의미 패턴을 사용하여 향후 추가나 관리가 용이하도록 구성됨이 바람직하다. 예를 들어, 어휘 의미 패턴을 적용한 상기 문장 유형 판단 단계(123)의 구체적인 동작 예는 도 2에 도시된 바와 같다. The sentence type determination step 123 is as follows. In sentence translation, words representing various grammars and situations should be translated. However, it is difficult to use all the information due to the limitation of storage resources. To overcome this problem, the present invention defines a predetermined number of sentence types that determine the grammatical elements of the input content. For example, the sentence types are shown in Table 3 above. Defined sentence types are classified grammatically and assigned to input sentences in consideration of processing speed and storage space. In this case, it is preferable to use a lexical semantic pattern for general use of the constructed grammar so that it can be easily added or managed in the future. For example, a specific operation example of the sentence type determination step 123 to which the lexical semantic pattern is applied is shown in FIG. 2.

상기 개념어 추출 단계(125)는 다음과 같다. 동일한 의미의 다양한 표현들에 대응하기 위해서, 상기 다양한 표현들을 일괄하여 처리할 수 있는 단위가 요구된다. 본 발명은 상기 단위를 개념(예 : 장소), 동일 개념에 속하는 구체적인 표현을 개념어(예: 서울, 뉴욕)라 칭한다. 따라서, 본 발명은 수집된 코퍼스(corpus)들을 대상으로 효과적인 개념들을 정의한다. 예를 들어, 상기 개념들의 정의는 상기 <표 2>와 같다. 코퍼스들의 수집 시 구축자가 필요한 개념들을 문장에 표현해 줄 수 있으나, 사용자는 개념이 아닌 개념어를 입력한다. 따라서, 번역 시스템은 상기 개념어를 자동으로 추출함으로써, 해당 개념을 파악하는 과정을 수행해야 한다. 도 3은 개념어를 자동으로 추출하기 위한 과정과 각 단계에 대한 구체적인 예들을 나타낸다. The conceptual word extraction step 125 is as follows. In order to correspond to various expressions having the same meaning, a unit capable of collectively processing the various expressions is required. In the present invention, the unit is referred to as a concept (eg, place), and a concrete expression belonging to the same concept is a concept word (eg, Seoul, New York). Accordingly, the present invention defines effective concepts for the collected corpus. For example, the definitions of the concepts are shown in Table 2 above. Constructors can express necessary concepts in a sentence when collecting corpus, but the user inputs a conceptual word rather than a concept. Therefore, the translation system should automatically extract the concept word, and perform the process of identifying the concept. 3 shows a process for automatically extracting conceptual words and specific examples of each step.

도 3은 본 발명의 실시 예에 따라 개념어를 자동으로 추출하기 위한 과정을 도시하고 있다. 상기 도 3을 참고하면, 301단계에서 입력 문장이 입력된다. 예를 들어, 상기 입력 문장은 "뉴욕으로 가는 비행기 편을 예약하고 싶습니다."이다. 이후, 303단계에서, 상기 입력 문장에 대한 형태소 분석이 수행된다. 형태소 구성의 분석 결과는 "뉴욕/NNF+으로/JKB 가/VV+는/ETM 비행기/NNG 편/NNB+을/JKO 예약/NNG+하고/JKB 싶/VX+습니다/EF+./SF"이다. 305단계에서, 분석된 형태소들 중 개념어를 추출하기 위해 개념어 사전 검색이 수행된다. 이때, 상기 개념어 사전에 '뉴욕/NNP'은 등록되어 있지 아니하고, '비행기/NNG'는 등록되어 있다 가정한다. 307단계에서, 1차 태깅(tagging)이 수행된다. 상기 1차 태깅의 결과, '비행기/NNG'가 '#TRP'으로 태깅된다. 그리고, 309단계에서, 미등록 개념어 추출 규칙 매칭(matching)이 수행된다. 상기 미등록 개념어 추출 규칙은 명사와 결합된 조사 및 동사를 이용한다. 상기 도 3의 경우, 미등록 개념어인 '뉴욕/NNP'에 목적지를 나타내는 조사 '으로/JKB' 및 동사 '가/VV'가 결합되어 있으므로, 상기 '뉴욕/NNP'은 장소를 나타내는 명사임이 추정된다. 따라서, 상기 '뉴욕/NNP'은 개념어 사전에 등록되어 있지 아니하나, 상술한 바와 같은 미등록 개념어 추출 규칙에 의해 '장소'의 개념을 가지는 개념어로 판단된다. 즉, 명사와 결합된 조사 및 동사 중 적어도 하나를 이용하여 상기 명사의 개념이 판단된다. 이후, 311단계에서, 2차 태깅이 수행됨으로써, 상기 '뉴욕/NNP'은 '#LOC'로 태깅된다. 이후, 313단계에서, 개념어 추출의 결과로서 '{{뉴욕;LOC}}으로 가는 {{비행기;TRP}} 편을 예약하고 싶습니다'가 결정된다.
3 illustrates a process for automatically extracting conceptual words according to an embodiment of the present invention. Referring to FIG. 3, an input sentence is input in step 301. For example, the input sentence is "I would like to book a flight to New York." Then, in step 303, the morphological analysis of the input sentence is performed. The analysis result of the morphological composition is "New York / NNF + / JKB / VV + / ETM Planes / NNG Flight / NNB + / JKO Booking / NNG + / JKB Wanted / VX + / EF +. / SF". In operation 305, a conceptual word dictionary search is performed to extract conceptual words from the analyzed morphemes. In this case, it is assumed that 'New York / NNP' is not registered in the conceptual word dictionary and 'Airplane / NNG' is registered. In step 307, first tagging is performed. As a result of the primary tagging, 'plane / NNG' is tagged as '#TRP'. In operation 309, unregistered conceptual word extraction rule matching is performed. The unregistered conceptual word extraction rule uses a search and a verb combined with a noun. In the case of FIG. 3, since the unregistered conceptual word 'New York / NNP' is combined with the search '/ JKB' and the verb 'ga / VV' indicating a destination, the 'New York / NNP' is assumed to be a noun representing a place. . Therefore, the 'New York / NNP' is not registered in the conceptual word dictionary, but is determined as a conceptual word having a concept of 'place' by the unregistered conceptual word extraction rule as described above. That is, the concept of the noun is determined using at least one of a search and a verb combined with a noun. Thereafter, in step 311, the second tagging is performed, so that 'New York / NNP' is tagged as '#LOC'. Thereafter, in step 313, it is determined that 'I would like to book a flight {{Airplane; TRP}} to {{New York; LOC}}' as a result of the conceptual word extraction.

상기 번역 어플리케이션(130)은 등록된 문장 중 입력 문장과 가장 유사한 문장을 검색하고, 검색된 문장을 상황에 적합하게 가공함으로써 번역 문장을 출력한다. 상기 도 1에 도시된 바와 같이, 상기 번역 어플리케이션(130)의 동작은 질의 분석 단계(131), 언어모델 문장 비교 단계(133), 적합 후보 선택 단계(135), 결과 생성 단계(137)를 포함한다.The translation application 130 searches for a sentence most similar to the input sentence among the registered sentences, and outputs the translated sentence by processing the found sentence appropriately. As shown in FIG. 1, the operation of the translation application 130 includes a query analysis step 131, a language model sentence comparison step 133, a suitable candidate selection step 135, and a result generation step 137. do.

상기 질의 분석 단계(131)는 입력 문장에서 검색에 필요한 정보들을 추출하는 과정이다. The query analyzing step 131 is a process of extracting information necessary for searching from an input sentence.

상기 언어 모델 문장 비교 단계(133)는 다음과 같다. 문장 비교의 목적은 입력된 모국어 문장과 가장 유사한 구축된 모국어 문장을 결정하는 것이다. 문장 비교를 위해, 본 발명은 확률에 근거한 언어 모델 방법을 이용한다. 상기 언어 모델 방법은 하나의 문장으로부터 다른 문장이 생성될 확률 값을 결정하고, 상기 확률 값을 이용해 유사도를 평가한다. 예를 들어, 상기 확률 값은 하기 <수학식 1>과 같이 결정된다.The language model sentence comparison step 133 is as follows. The purpose of sentence comparison is to determine the constructed native language sentence most similar to the input native language sentence. For sentence comparison, the present invention uses a probabilistic language model method. The language model method determines a probability value for generating another sentence from one sentence and evaluates similarity using the probability value. For example, the probability value is determined as in Equation 1 below.

상기 <수학식 1>에서, 상기

는 유사도, 상기

는 후보 문장, 상기

는 입력 문장, 상기

는 입력 문장을 구성하는 i번째 항목, 상기

은 입력 문장을 구성하는 항목들의 개수를 의미한다.In Equation 1,

Is similarity, said

Is a candidate sentence, said

Is the input sentence, said

Is the i th item constituting the input sentence,

Means the number of items constituting the input sentence.

상기 <수학식 1>에 따르는 경우, 많은 개수의 항목들이 상기

에서 발견되지 않음으로 인해 확률 값이 0이 될 수 있다. 따라서, 위의 <수학식 1>을 구축된 전체 문장들의 집합에서

가 나타날 확률에 따라 스무싱(Smoothing)하면 하기 <수학식 2>와 같이 정리된다.In accordance with Equation 1, a large number of items

The probability value can be zero because it is not found in. Therefore, in the set of entire sentences constructed with Equation 1 above,

Smoothing according to the probability of appearing is summarized as in Equation 2 below.

상기 <수학식 2>에서, 상기

는 유사도, 상기

는 후보 문장, 상기

는 입력 문장, 상기

는 입력 문장을 구성하는 i번째 항목, 상기

은 입력 문장을 구성하는 항목들의 개수, 상기

는 가중치, 상기

은 구축된 전체 문장들의 집합을 의미한다.In Equation 2,

Is similarity, said

Is a candidate sentence, said

Is the input sentence, said

Is the i th item constituting the input sentence,

Is the number of items constituting the input sentence,

Is a weight,

Means the complete set of sentences.

상기 <수학식 2>에 따라 결정된 입력 문장과의 유사도가 결정되며, 최대의 확률 값을 갖는 문장이 유사한 문장으로 결정된다. Similarity with an input sentence determined according to Equation 2 is determined, and a sentence having a maximum probability value is determined as a similar sentence.

하나의 개념이 문장 내에서 다양한 단어들로서 표현되는 경우가 있다. 예를 들어, 서울, 부산, 인천 등의 단어들은 도시 이름이라는 공통된 개념을 갖는다. 그러므로, 검색 시 단어를 키워드로 사용하는 것 외에, 단어가 갖는 개념을 키워드로서 사용하는 것이 요구된다. 본 발명은 색인 및 검색에서 문장을 표현하기 위해 사용되는 키워드 집합을 단어와 함께 개념들을 이용하여 정의한다. 예를 들어, 키워드 집합은 하기 <수학식 3>과 같다.There is a case where a concept is expressed as various words in a sentence. For example, words such as Seoul, Busan, and Incheon have a common concept of city name. Therefore, in addition to using a word as a keyword in a search, it is required to use the concept of a word as a keyword. The present invention defines a set of keywords used with concepts, along with words, to represent sentences in indexes and searches. For example, the keyword set is shown in Equation 3 below.

상기 <수학식 3>에서, 상기

는 키워드 집합, 상기

는 문장

에 포함된 단어 집합, 상기

는 문장

의 개념 집합, 상기

는 문장이 가지는 문장 유형 타입을 의미한다.In Equation 3,

Is a set of keywords, said

Sentence

A set of words contained in the above

Sentence

Set of concepts, reminding

Means a sentence type type that a sentence has.

여기서, 상기 개념 집합은 하기 <수학식 4>와 같이 표현될 수 있다.Here, the concept set may be expressed as Equation 4 below.

상기 <수학식 4>에서, 상기

는 문장

의 개념 집합, 상기

는 개념, 상기

는 문장

에 포함된 단어 집합, 상기

는 i번째 개념, 상기

는 특정 개념을 가지지 아니하는 단어들의 개념 분류, 상기

는 상기 를 포함하는 단어 집합을 의미한다.In Equation 4,

Sentence

Set of concepts, reminding

Remind, concept

Sentence

A set of words contained in the above

Is the i th concept, said

Is a conceptual classification of words that do not have a particular concept,

Above Means a word set that includes.

개념어 클래스가 있는 경우만 키워드에 추가한 경우, 개념어를 포함하는 문장들의 길이가 길어져서 상대적으로 개념어 단어가 일반 단어에 비해 낮은 확률을 가지게 된다. 이러한 문제를 처리하기 위해, 본 발명은 개념을 가지지 아니하는 단어들을

이라는 개념 분류에 포함시킨다. If the keyword is added only to the keyword, the sentence containing the conceptual word is lengthened, so that the conceptual word has a lower probability than the general word. To address this problem, the present invention uses words that do not have a concept.

In the concept category

번역 어플리케이션(130)은 문장 및 문장의 비교이므로, 일치하는 단어가 상대적으로 적다. 따라서, 문장 비교를 위해 충분한 확률 정보가 보장되지 않기 때문에, 성능이 저하될 수 있다. 또한, 동일한 의미를 나타내더라도 구성되는 단어는 전혀 다를 수도 있다. 단어의 다양성 및 정보 부족으로 인하여 동일한 의미의 문장들 간 유사도가 낮아지는 현상을 방지하기 위해, 본 발명은 하나의 문장만으로 검색하기보다 동일한 의미의 문장들을 묶은 클러스터(cluster)를 이용한다. 상기 클러스터는 클러스터링에 의해 결정되며, 상기 클러스터링의 예는 도 4에 도시된 바와 같다. Since the translation application 130 is a comparison of sentences and sentences, relatively few words match. Thus, since sufficient probability information is not guaranteed for sentence comparison, performance may be degraded. In addition, even if the same meaning, the words constituted may be completely different. In order to prevent the similarity between sentences having the same meaning due to the diversity of words and the lack of information, the present invention uses clusters that group sentences having the same meaning rather than searching with only one sentence. The cluster is determined by clustering, and an example of the clustering is shown in FIG. 4.

도 4는 본 발명의 실시 예에 따른 클러스터링을 도시하고 있다. 상기 도 4를 참고하면, 입력 문장(410)은 "차가 얼마나 늦게까지 운행합니까?"이다. 상기 입력 문장(410)에 대하여 후보 문장들은 겹치는 정보가 없거나 하나만 존재하기 때문에, 적절한 문장이 결정되기 어렵다. 하지만, 동일한 의미를 갖는 문장들의 정보를 고려하면, 클러스터A(401) 및 클러스터C(403)보다 클러스터B(402)에 속하는 문장들이 같은 클러스터에 속하는 문장들로부터 정보를 지원받기 때문에 상기 입력 문장(410)에 유사하다. 이와 같은 클러스터의 정보는 스무싱(smoothing) 기법을 통해 상기 <수학식 2>에 적용될 수 있다. 상기 클러스터의 개념을 적용하면, 상기 <수학식 2>는 하기 <수학식 5>와 같이 정리된다.4 illustrates clustering according to an embodiment of the present invention. Referring to FIG. 4, the input sentence 410 is, "How late does the car run?" Since there is no overlapping information or only one candidate sentence for the input sentence 410, an appropriate sentence is difficult to determine. However, considering the information of sentences having the same meaning, since the sentences belonging to the cluster B 402 are supported by the sentences belonging to the same cluster than the cluster A 401 and the cluster C 403, the input sentences ( Similar to 410). The information of the cluster may be applied to Equation 2 through a smoothing technique. Applying the concept of the cluster, Equation 2 is summarized as Equation 5 below.

상기 <수학식 5>에서, 상기

는 유사도, 상기

는 후보 문장, 상기

는 입력 문장, 상기

는 입력 문장을 구성하는 i번째 항목, 상기

은 입력 문장을 구성하는 항목의 개수, 상기

및 상기

는 가중치, 상기

는 상기

와 동일한 의미를 가지는 문장들의 집합, 상기

은 구축된 전체 문장들의 집합을 의미한다.In Equation 5,

Is similarity, said

Is a candidate sentence, said

Is the input sentence, said

Is the i th item constituting the input sentence,

Is the number of items constituting the input sentence,

And

Is a weight,

Above

A set of sentences having the same meaning as

Means the complete set of sentences.

상기 <수학식 5>에서 각 항목에 대한 확률 값은 MLE(Maximum-Likelihood Estimation) 기법에 의해 하기 <수학식 6>과 같이 결정된다.In Equation 5, a probability value for each item is determined as shown in Equation 6 by the Maximum-Likelihood Estimation (MLE) technique.

상기 <수학식 6>에서, 상기

는 후보 문장, 상기

는 입력 문장을 구성하는 i번째 항목, 상기

은 입력 문장을 구성하는 항목의 개수를 의미한다.In Equation 6,

Is a candidate sentence, said

Is the i th item constituting the input sentence,

Means the number of items constituting the input sentence.

또한, 상기 <수학식 5>에서 가중치

및

는 디리끌레 프라이어 스무싱(Dirichlet prior smoothing)을 적용하여 하기 <수학식 7>과 같이 결정된다.In addition, the weight in Equation 5

And

Is determined by Equation 7 by applying Dirichlet prior smoothing.

상기 <수학식 7>에서, 상기

및 상기

확률 값 결정을 위한 가중치들, 상기

는 후보 문장, 상기

는 상기

와 동일한 의미를 가지는 문장들의 집합을 의미한다.In Equation 7,

And

Weights for determining a probability value, said

Is a candidate sentence, said

Above

Means a set of sentences having the same meaning as.

상기 결과 생성 단계(137)는 다음과 같다. 언어 모델을 통해 문장의 유사도가 결정되면, 사용자는 제시된 모국어 문장들 중에서 가장 가까운 의미의 문장을 선택한다. 이때, 제시된 문장들 중 사용자가 원하는 문장과 완전하게 일치하는 문장이 존재하지 아니할 수 있다. 특히, 제시된 문장들은 사용자가 원하는 표현은 가지고 있으나, 개념만 일치하고 구체적인 단어가 일치 아니하여 내용은 상이한 경우가 있을 수 있다. 본 발명은 이런 상황을 방지하기 위하여 개념어 대역 사전을 이용한 번역 문장 수정 과정을 수행한다. 상기 문장 수정은 동일 개념을 가지는 단어들 간의 치환을 통해 수행된다. 상기 개념어 대역 사전은 개념어의 후보가 될 수 있는 단어 또는 구(phrase)에 대해 모국어 및 외국어에 해당하는 단어들끼리 연결을 시키고, 결정된 문장의 해당 개념어를 치환하는데 이용된다. 예를 들어, 개념어 대역 사전을 이용한 결과 생성의 예는 도 5와 같다.The result generating step 137 is as follows. When the sentence similarity is determined through the language model, the user selects a sentence having the closest meaning among the presented native language sentences. At this time, among the presented sentences there may not be a sentence that completely matches the sentence desired by the user. In particular, the presented sentences have the expression desired by the user, but the contents may be different because only the concepts match and the specific words do not match. The present invention performs a translation sentence correction process using a conceptual band dictionary to prevent this situation. The sentence correction is performed through substitution between words having the same concept. The conceptual band dictionary is used to connect words corresponding to a native language and a foreign language with respect to a word or phrase that may be a candidate of the conceptual word, and to substitute the corresponding conceptual word of the determined sentence. For example, the result generation using the conceptual band dictionary is shown in FIG. 5.

상기 도 5는 본 발명의 실시 예에 따른 개념어 대역 사전을 이용한 개념어 치환을 도시하고 있다. 상기 도 5를 참고하면, 입력 문장(510)은 "뉴욕까지 가는 비행기 표 두 장 주세요."이다. 상기 입력 문장(510)에 대하여 유사도 비교를 통해 제시된 문장들(520)은 "서울에서 뉴욕으로 가는 비행기의 왕복표를 예약하려고 하는데요.", "샌프란시스코까지 가는 표 한 장 주세요.", "뉴욕까지 왕복 비행기표는 얼마입니까", "학생 표 한 장 주세요.", "어른용 왕복표 한 장 주세요"이다. 이때, 사용자는 2번째 문장인 "샌프란시스코까지 가는 표 한 장 주세요."를 선택한다. 이에 따라, 상기 번역 어플리케이션(130)은 치환 질의 및 치환 가능한 개념어들을 알리는 문구(530)를 표시한다. 상기 도 5에 도시된 바와 같이, 상기 2번째 문장에서 치환 가능한 개념어들은 '샌프란시스코' 및 '한 장'이다. 사용자는 '모두 치환'을 선택하고, 이에 따라, 번역 결과(540)로서 "Two tickets to New York, please."가 출력된다.
5 illustrates conceptual word substitution using a conceptual word band dictionary according to an embodiment of the present invention. Referring to FIG. 5, the input sentence 510 is "Please give two tickets to New York." Sentences 520 presented through a similarity comparison for the input sentence 510 is "I want to book a return ticket for a flight from Seoul to New York", "Please give me a ticket to San Francisco,""To New York. How much is the return ticket? "," Please give me a student ticket. "," Please give me a return ticket for adults. " At this point, the user selects the second sentence, "Please give me a ticket to San Francisco." Accordingly, the translation application 130 displays a phrase 530 for notifying the substitution query and the conceptual words that can be substituted. As shown in FIG. 5, the concept words that can be substituted in the second sentence are 'san francisco' and 'one chapter'. The user selects 'substitute all', and accordingly, "Two tickets to New York, please." Is output as the translation result 540.

이하 본 발명은 상술한 바와 같이 번역 서비스를 제공하는 휴대용 단말기의 구성 및 동작을 설명한다.
The present invention will be described below the configuration and operation of a portable terminal providing a translation service as described above.

도 6은 본 발명의 실시 예에 따른 휴대용 단말기의 블록 구성을 도시하고 있다.6 is a block diagram of a portable terminal according to an embodiment of the present invention.

상기 도 6에 도시된 바와 같이, 표시부(602), 입력부(604), 저장부(606), 제어부(608)를 포함하여 구성된다.As shown in FIG. 6, the display unit 602 includes an input unit 602, an input unit 604, a storage unit 606, and a control unit 608.

상기 표시부(602)는 상기 단말의 동작 중에 발생하는 상태 정보 및 응용 프로그램의 실행에 따른 숫자, 문자 및 영상 등을 표시한다. 즉, 상기 표시부(602)는 상기 제어부(608)로부터 제공되는 화상 데이터를 시각적 화면으로 표시한다. 예를 들어, 상기 표시부(602)는 LCD(Liquid Crystal Display), OLED(Organic Light-Emitting Diode) 등으로 구성될 수 있다.The display unit 602 displays status information generated during operation of the terminal and numbers, letters, and images according to execution of an application program. That is, the display unit 602 displays the image data provided from the control unit 608 on a visual screen. For example, the display unit 602 may include a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.

상기 입력부(604)는 사용자에 의해 발생하는 입력을 인지하고, 입력에 대응되는 정보를 상기 제어부(608)로 제공한다. 즉, 상기 입력부(604)는 키보드, 키패드, 터치스크린, 터치패드, 마우스, 특수 기능 버튼 등을 통한 사용자의 입력을 처리한다. The input unit 604 recognizes an input generated by a user and provides information corresponding to the input to the controller 608. That is, the input unit 604 processes a user's input through a keyboard, a keypad, a touch screen, a touch pad, a mouse, special function buttons, and the like.

상기 저장부(606)는 상기 휴대용 단말기의 동작에 필요한 운영 체제, 응용 프로그램, 사용자 컨텐츠(contents)를 저장한다. 특히, 상기 저장부(606)는 본 발명의 실시 예에 따른 번역 서비스를 위한 번역 대역 데이터, 개념어 대역 사전, 문장 정보, 색인 정보, 번역 어플리케이션을 저장한다. 그리고, 상기 저장부(606)는 상기 제어부(608)의 요청에 따라 상기 제어부(608)로 저장된 데이터를 제공한다.The storage unit 606 stores an operating system, an application program, and user contents necessary for the operation of the portable terminal. In particular, the storage unit 606 stores translation band data, conceptual band dictionary, sentence information, index information, and translation application for a translation service according to an embodiment of the present invention. The storage unit 606 provides the stored data to the control unit 608 at the request of the control unit 608.

상기 제어부(608)는 상기 휴대용 단말기의 전반적인 기능들을 제어한다. 예를 들어, 상기 제어부(608)는 상기 표시부(602)로 화상 데이터를 제공하고, 상기 입력부(604)로부터 제공되는 입력 정보를 처리한다. 특히, 본 발명의 실시 예에 따라, 상기 제어부(608)는 상기 번역 서비스를 위한 기능들을 제어한다. 상기 번역 서비스를 위한 기능들은 번역 서비스를 위한 데이터 관리 기능, 문장 분석 기능, 번역 서비스를 위한 어플리케이션 실행 기능을 포함한다. 상기 번역 서비스를 위한 기능들을 상세히 살펴보면 다음과 같다.The controller 608 controls the overall functions of the portable terminal. For example, the control unit 608 provides image data to the display unit 602 and processes input information provided from the input unit 604. In particular, according to an embodiment of the present invention, the controller 608 controls the functions for the translation service. Functions for the translation service include a data management function for the translation service, a sentence analysis function, and an application execution function for the translation service. Looking at the function for the translation service in detail as follows.

상기 번역 서비스를 위한 데이터 관리를 위해, 상기 제어부(608)는 외부로부터 입력되는 번역 서비스를 위한 구축 데이터를 상기 저장부(606)에 저장한다. 상기 구축 데이터는 문장 정보, 모국어 문장 및 외국어 문장의 매핑(mapping) 관계 정보, 문장들의 카테고리 분류 정보, 문장 비교를 위한 개념어 정보, 문장 유형 정보, 문장 유형별 정규 표현 정보, 클러스터 정보, 문장 검색을 위한 색인 정보 등을 포함한다. 예를 들어, 상기 카테고리 분류는 상기 <표 2>와 같고, 상기 문장 유형은 상기 <표 3>과 같이 정의되고, 상기 문장 정보는 상기 <표 4>, 상기 색인 정보는 상기 <표 5>와 같이 구성될 수 있다. 또한, 상기 제어부(608)는 상기 번역 서비스 제공 시 번역의 대상으로서 입력되는 모국어 문장의 분석 결과를 상기 저장부(606)에 저장함으로써, 상기 구축 데이터를 보강한다. In order to manage data for the translation service, the controller 608 stores construction data for a translation service input from the outside in the storage unit 606. The construction data includes sentence information, mapping information between a native language sentence and a foreign language sentence, category classification information of sentences, conceptual word information for sentence comparison, sentence type information, regular expression information for each sentence type, cluster information, and sentence retrieval. Index information and the like. For example, the category classification is as in Table 2, the sentence type is defined as in Table 3, the sentence information is as shown in Table 4, and the index information is as shown in Table 5. It can be configured together. In addition, the control unit 608 reinforces the construction data by storing, in the storage unit 606, an analysis result of a native language sentence that is input as a translation target when providing the translation service.

상기 문장 분석 기능을 위해, 상기 제어부(608)는 번역의 대상으로서 사용자에 의해 입력되는 입력 문장의 형태소 구성을 분석하고, 형태소 구성을 이용하여 상기 입력 문장의 문장 유형을 판단한 후, 상기 입력 문장에 포함된 개념어를 추출한다. 이때, 상기 개념어는 존재하지 아니할 수 있다. 상기 형태소 분석에 있어서, 상기 제어부(608)는 체언, 용언 등과 같은 핵심어들은 키워드로서, 문법적인 요소인 부수적 형태소들은 문장 유형을 나타내는 정보로서 처리한다. 상기 문장 유형 판단에 있어서, 상기 제어부(608)는 상기 형태소 분석의 결과에 근거하여 해당 문장의 문장 유형을 판단한다. 예를 들어, 문장 유형의 후보들은 상기 <표 3>에 나타난 바와 같다. 개념어 추출에 있어서, 상기 제어부(608)는 미리 구축된 개념어 대역 사전을 참고하여 문장에 포함된 개념어를 추출한다. 상기 문장 유형 판단에 있어서, 상기 제어부(608)는 형태소 구성의 분석 결과를 미리 정의된 정규 표현의 형태소 구성과 비교함으로써, 상기 입력 문장이 미리 정의된 다수의 문장 유형들 중 어느 문장 유형에 속하는지 판단한다. 예를 들어, 상기 제어부(608)는 상기 도 3에 도시된 바와 같이 문장 유형을 결정한다. 그리고, 상기 개념어 추출에 있어서, 상기 제어부(608)는 개념어 대역 사전에 등록된 개념어를 우선적으로 추출하고, 미등록 개념어 추출 규칙에 따라 미등록 개념어를 추출한 후, 추출된 단어를 태깅한다. 단, 상기 입력 문장이 항상 개념어를 포함하는 것은 아니므로, 상기 개념어가 추출되지 아니한 경우, 상기 태깅은 생략될 수 있다.For the sentence analysis function, the controller 608 analyzes the morpheme structure of the input sentence input by the user as a target of translation, determines the sentence type of the input sentence using the morpheme structure, and then applies the input sentence. Extract included conceptual words. In this case, the conceptual word may not exist. In the morpheme analysis, the controller 608 processes key words such as a verb and a verb as keywords, and additional morphemes that are grammatical elements as information indicating a sentence type. In determining the sentence type, the controller 608 determines the sentence type of the sentence based on the result of the morpheme analysis. For example, candidates of sentence type are as shown in Table 3 above. In concept word extraction, the controller 608 extracts a concept word included in a sentence by referring to a pre-built concept word band dictionary. In the sentence type determination, the controller 608 compares the analysis result of the morpheme structure with a morpheme structure of a predefined regular expression, so that the sentence type belongs to a plurality of predefined sentence types of sentence types. To judge. For example, the controller 608 determines the sentence type as shown in FIG. 3. In the conceptual word extraction, the controller 608 first extracts a conceptual word registered in the conceptual word band dictionary, extracts an unregistered conceptual word according to an unregistered conceptual word extraction rule, and then tags the extracted word. However, since the input sentence does not always include a concept word, when the concept word is not extracted, the tagging may be omitted.

번역 서비스를 위한 어플리케이션 실행 기능을 위해, 상기 제어부(608)는 상기 저장부(606)에 저장된 번역 어플리케이션을 실행한다. 상기 번역 어플리케이션은 번역 서비스를 위한 사용자 인터페이스, 문장 비교 및 검색 알고리즘을 포함한다. 상기 번역 어플리케이션을 실행하는데 있어서, 상기 저장부(606)는 상기 표시부(502)를 통해 정의된 사용자 인터페이스 화면들을 표시한다. 상기 사용자 인터페이스 화면들은 문장을 입력받는 화면, 유사 문장을 제시하는 화면, 개념어를 치환하는 화면, 번역 결과를 보여주는 화면 등을 포함한다. 상기 번역 어플리케이션을 실행 후, 번역하고자하는 모국어 문장이 입력되면, 상기 제어부(608)는 상기 문장 분석 기능에 따라 입력 문장을 분석하고, 상기 문장 비교 알고리즘에 따라 적어도 하나의 유사 문장을 선택 및 제시하고, 개념어의 치환 여부 및 치환 처리를 수행한 후, 번역 문장을 표시한다. 상기 적어도 하나의 유사 문장을 선택하기 위해, 상기 제어부(608)는 상기 입력 문장 및 상기 저장부(606)에 저장된 후보 문장들을 비교한다. 이때, 상기 제어부(608)는 후보 문장들의 클러스터링을 이용한다. 상세히 설명하면, 상기 제어부(608)는 상기 문장 분석 기능을 통해 결정된 상기 입력 문장의 문장 유형에 속하는 후보 문장들 중 상기 입력 문장의 키워드들의 일부 또는 전부를 포함하는 문장들을 검색한다. 그리고, 상기 제어부(608)는 상기 키워드들의 일부 또는 전부를 포함하는 문장들의 클러스터 분포를 확인하고, 가장 많은 수의 문장을 포함하는 클러스터 내에서 적어도 하나의 유사 문장을 제시한다. 예를 들어, 상기 제어부(608)는 상기 적어도 하나의 유사 문장은, 각 후보 문장에서 상기 입력 문장이 생성될 확률들을 산출하고, 가장 높은 확률을 갖는 적어도 하나의 후보 문장을 상기 적어도 하나의 유사 문장으로서 제시한다. 예를 들어, 상기 확률은 상기 <수학식 5>와 같이 산출될 수 있다. 여기서, 상기 적어도 하나의 유사 문장의 개수는 구체적인 실시 예에 따라 달라질 수 있다. 상기 적어도 하나의 유사 문장 중 상기 사용자에 의해 특정 문장이 선택되면, 상기 제어부(608)는 선택된 문장에 포함된 적어도 하나의 개념어의 치환 여부를 문의하고, 사용자의 조작에 따라 상기 적어도 하나의 개념어를 치환 또는 유지한다. 즉, 상기 제어부(608)는 치환 여부를 나타내는 화면을 표시하고, 상기 사용자의 조작에 따라 상기 적어도 하나의 개념어의 전부 또는 일부를 치환하거나, 또는, 상기 적어도 하나의 개념어를 유지한다. In order to execute an application for a translation service, the controller 608 executes a translation application stored in the storage unit 606. The translation application includes a user interface for a translation service, sentence comparison and search algorithm. In executing the translation application, the storage unit 606 displays user interface screens defined through the display unit 502. The user interface screens include a screen for inputting a sentence, a screen for presenting a similar sentence, a screen for replacing a conceptual word, a screen for displaying a translation result, and the like. After executing the translation application, when a native language sentence to be translated is input, the controller 608 analyzes the input sentence according to the sentence analysis function, selects and presents at least one similar sentence according to the sentence comparison algorithm. After performing the substitution process and whether or not the conceptual word, the translation sentence is displayed. To select the at least one similar sentence, the controller 608 compares the input sentence with candidate sentences stored in the storage 606. In this case, the controller 608 uses clustering of candidate sentences. In detail, the controller 608 searches for sentences including some or all of keywords of the input sentence among candidate sentences belonging to the sentence type of the input sentence determined through the sentence analysis function. In addition, the controller 608 identifies the cluster distribution of sentences including some or all of the keywords, and presents at least one similar sentence in the cluster including the largest number of sentences. For example, the controller 608 calculates probabilities that the input sentence is generated in each candidate sentence, and calculates at least one candidate sentence having the highest probability with the at least one similar sentence. It is presented as. For example, the probability may be calculated as shown in Equation 5 above. Here, the number of the at least one similar sentence may vary according to a specific embodiment. If a specific sentence is selected by the user among the at least one similar sentence, the controller 608 inquires whether to replace at least one conceptual word included in the selected sentence, and inquires the at least one conceptual word according to a user's manipulation. Substitute or retain That is, the controller 608 displays a screen indicating whether to replace or replaces all or part of the at least one conceptual word or maintains the at least one conceptual word according to the user's manipulation.

본 발명은 상기 도 6을 참고하여 모국어 문장을 입력받고, 외국어 문장을 제공하는 실시 예를 설명하였다. 하지만, 본 발명의 다른 실시 예에 따라, 상기 제어부(608)는 상기 모국어 문장이 아닌 키워드를 입력받을 수 있다. 이 경우, 상기 제어부(608)는 상기 키워드를 포함하는 적어도 하나의 후보 문장을 제시함으로써 사용자가 외국어로의 번역을 원하는 문장을 선택하도록 할 수 있다.
The present invention has been described with reference to FIG. 6, in which a native language sentence is input and a foreign language sentence is provided. However, according to another embodiment of the present disclosure, the controller 608 may receive a keyword other than the native language sentence. In this case, the controller 608 may allow the user to select a sentence to be translated into a foreign language by presenting at least one candidate sentence including the keyword.

도 7은 본 발명의 실시 예에 따른 휴대용 단말기의 동작 절차를 도시하고 있다.7 illustrates an operation procedure of a portable terminal according to an embodiment of the present invention.

상기 도 7에 도시된 바와 같이, 상기 휴대용 단말기는 701단계에서 구축 데이터를 저장한다. 상기 구축 데이터는 번역 서비스를 위한 문장 정보 및 색인 정보를 포함한다. 상기 구축 데이터는 문장 정보, 모국어 문장 및 외국어 문장의 매핑(mapping) 관계 정보, 문장들의 카테고리 분류 정보, 문장 비교를 위한 개념어 정보, 문장 유형 정보, 문장 유형별 정규 표현 정보, 클러스터 정보, 색인 정보 등을 포함한다, 예를 들어, 상기 문장 정보 및 상기 색인 정보는 상기 <표 4> 및 상기 <표 5>와 같은 구조를 가질 수 있다. 추가적으로, 상기 휴대용 단말기는 번역 대역 데이터 및 개념어 대역 사전을 저장할 수 있다. 상기 701단계에서 저장되는 데이터는 시스템 설계자에 의해 구성된 후, 최초의 번역 서비스 제공에 앞서 외부로부터 제공된다.As illustrated in FIG. 7, the portable terminal stores construction data in step 701. The construction data includes sentence information and index information for a translation service. The construction data may include sentence information, mapping relation information between a native language sentence and a foreign language sentence, category classification information of sentences, conceptual word information for sentence comparison, sentence type information, regular expression information for each sentence type, cluster information, index information, and the like. For example, the sentence information and the index information may have structures such as <Table 4> and <Table 5>. Additionally, the portable terminal may store translation band data and conceptual band dictionary. After the data stored in operation 701 is configured by the system designer, the data is provided from the outside prior to providing the first translation service.

이후, 상기 휴대용 단말기는 703단계로 진행하여 번역하고자하는 모국어 문장이 입력되는지 확인한다. 즉, 상기 휴대용 단말기는 번역 어플리케이션이 실행된 후, 번역의 대상인 모국어 문장이 입력되는지 판단한다.In step 703, the portable terminal checks whether a native language sentence to be translated is input. That is, after the translation application is executed, the portable terminal determines whether a native language sentence to be translated is input.

상기 모국어 문장이 입력되면, 상기 휴대용 단말기는 705단계로 진행하여 상기 번역의 대상인 모국어 문장, 다시 말해, 입력 문장을 분석한다. 상기 입력 문장의 분석은 형태소 분석, 문장 유형 판단, 개념어 추출을 포함한다. 즉, 상기 휴대용 단말기는 상기 입력 문장의 형태소 구성을 분석하고, 형태소 구성을 이용하여 상기 입력 문장의 문장 유형을 판단한 후, 상기 입력 문장에 포함된 개념어를 추출한다. 이때, 상기 개념어는 존재하지 아니할 수 있다. 상기 형태소 분석 시, 상기 휴대용 단말기는 체언, 용언 등과 같은 핵심어들은 키워드로서, 문법적인 요소인 부수적 형태소들은 문장 유형을 나타내는 정보로서 처리한다. 상기 문장 유형 판단 시, 상기 휴대용 단말기는 형태소 구성의 분석 결과를 미리 정의된 정규 표현의 형태소 구성과 비교함으로써, 상기 입력 문장이 미리 정의된 다수의 문장 유형들 중 어느 문장 유형에 속하는지 판단한다. 예를 들어, 상기 휴대용 단말기는 상기 도 3에 도시된 바와 같이 문장 유형을 결정한다. 그리고, 상기 개념어 추출 시, 상기 휴대용 단말기는 개념어 대역 사전에 등록된 개념어를 우선적으로 추출하고, 미등록 개념어 추출 규칙에 따라 미등록 개념어를 추출한 후, 추출된 단어를 태깅한다. 단, 상기 입력 문장이 항상 개념어를 포함하는 것은 아니므로, 상기 개념어가 추출되지 아니한 경우, 상기 태깅은 생략될 수 있다.When the native language sentence is input, the portable terminal proceeds to step 705 to analyze the native language sentence, that is, the input sentence to be translated. Analysis of the input sentence includes morphological analysis, sentence type determination, and conceptual word extraction. That is, the portable terminal analyzes the morpheme structure of the input sentence, determines the sentence type of the input sentence using the morpheme structure, and extracts the conceptual word included in the input sentence. In this case, the conceptual word may not exist. In the morpheme analysis, the portable terminal processes key words such as verbs, verbs, etc. as keywords, and additional morphemes, which are grammatical elements, as information representing sentence types. In determining the sentence type, the portable terminal compares the analysis result of the morpheme configuration with a morpheme configuration of a predefined regular expression to determine which sentence type among a plurality of predefined sentence types. For example, the portable terminal determines the sentence type as shown in FIG. 3. When the conceptual word is extracted, the portable terminal first extracts a conceptual word registered in the conceptual word band dictionary, extracts an unregistered conceptual word according to an unregistered conceptual word extraction rule, and then tags the extracted word. However, since the input sentence does not always include a concept word, when the concept word is not extracted, the tagging may be omitted.

이후, 상기 휴대용 단말기는 707단계로 진행하여 상기 입력 문장의 정보를 저장한다. 즉, 상기 휴대용 단말기는 상기 705단계를 통해 얻어진 상기 입력 문장의 분석 정보를 저장함으로써 상기 구축 데이터를 보강한다. 본 발명의 다른 실시 예에 따라, 상기 707단계는 생략되거나, 또는, 다른 시점에 수행될 수 있다.In step 707, the portable terminal stores information of the input sentence. That is, the portable terminal reinforces the construction data by storing analysis information of the input sentence obtained in step 705. According to another embodiment of the present disclosure, step 707 may be omitted or performed at another point in time.

이어, 상기 휴대용 단말기는 709단계로 진행하여 적어도 하나의 유사 문장을 제시한다. 상기 적어도 하나의 유사 문장은 상기 구축 데이터에 저장되어 있는 후보 문장들 중 적어도 하나이며, 상기 입력 문장 및 상기 후보 문장들과의 비교를 통해 선택된다. 이때, 상기 휴대용 단말기는 후보 문장들의 클러스터링을 이용한다. 즉, 상기 후보 문장들 각각은 특정 클러스터에 속한다. 상세히 설명하면, 상기 휴대용 단말기는 상기 705단계에서 결정된 상기 입력 문장의 문장 유형에 속하는 후보 문장들 중 상기 입력 문장의 키워드들의 일부 또는 전부를 포함하는 문장들을 검색한다. 그리고, 상기 휴대용 단말기는 상기 키워드들의 일부 또는 전부를 포함하는 문장들의 클러스터 분포를 확인하고, 가장 많은 수의 문장을 포함하는 클러스터 내에서 적어도 하나의 유사 문장을 제시한다. 예를 들어, 상기 휴대용 단말기는 상기 각 후보 문장에서 상기 입력 문장이 생성될 확률들을 산출하고, 가장 높은 확률을 갖는 적어도 하나의 후보 문장을 상기 적어도 하나의 유사 문장으로서 제시한다. 예를 들어, 상기 확률들은 상기 <수학식 5>와 같이 산출될 수 있다. 여기서, 상기 적어도 하나의 유사 문장의 개수는 구체적인 실시 예에 따라 달라질 수 있다.In operation 709, the portable terminal presents at least one similar sentence. The at least one similar sentence is at least one of candidate sentences stored in the construction data, and is selected through comparison with the input sentence and the candidate sentences. In this case, the portable terminal uses clustering of candidate sentences. That is, each of the candidate sentences belongs to a specific cluster. In detail, the portable terminal searches for sentences including some or all of keywords of the input sentence among candidate sentences belonging to the sentence type of the input sentence determined in step 705. The portable terminal identifies a cluster distribution of sentences including some or all of the keywords and presents at least one similar sentence in a cluster including the largest number of sentences. For example, the portable terminal calculates probabilities of generating the input sentence in each candidate sentence, and presents at least one candidate sentence having the highest probability as the at least one similar sentence. For example, the probabilities may be calculated as shown in Equation 5 above. Here, the number of the at least one similar sentence may vary according to a specific embodiment.

상기 유사 문장을 제시한 후, 상기 휴대용 단말기는 711단계로 진행하여 사용자에 의해 상기 적어도 하나의 유사 문장 중 하나가 선택되는지 확인한다. 즉, 상기 휴대용 단말기는 문장의 선택을 위한 상기 사용자의 조작이 발생하는지 판단한다.After presenting the similar sentence, the portable terminal proceeds to step 711 and determines whether one of the at least one similar sentence is selected by the user. That is, the portable terminal determines whether the user's manipulation for selecting a sentence occurs.

상기 사용자에 의해 문장이 선택되면, 상기 휴대용 단말기는 713단계로 진행하여 선택된 문장에 치환될 적어도 하나의 개념어가 포함되어 있는지 판단한다. 여기서, 상기 치환될 적어도 하나의 개념어는 상기 선택된 문장에 포함되고, 상기 입력 문장의 개념어와 동일한 개념에 속하되 상이한 대상을 나타내는 개념어를 의미한다. 예를 들어, 입력 문장에 '뉴욕'이 포함되고, 선택된 문장에 '서울'이 포함된 경우, 상기 '서울'이 치환될 개념어에 속한다. 만일, 상기 치환될 적어도 하나의 개념어가 포함되어 있지 아니하면, 상기 휴대용 단말기는 717단계로 진행한다.When the sentence is selected by the user, the portable terminal proceeds to step 713 to determine whether at least one concept word to be substituted is included in the selected sentence. Here, the at least one conceptual word to be substituted means a conceptual word included in the selected sentence and belonging to the same concept as the conceptual word of the input sentence, but representing a different object. For example, when 'New York' is included in the input sentence and 'Seoul' is included in the selected sentence, 'Seoul' belongs to the concept word to be replaced. If at least one concept word to be substituted is not included, the portable terminal proceeds to step 717.

반면, 상기 치환될 적어도 하나의 개념어가 포함되어 있으면, 상기 휴대용 단말기는 715단계로 진행하여 상기 적어도 하나의 개념어의 치환 여부를 문의하고, 사용자의 조작에 따라 상기 적어도 하나의 개념어를 치환 또는 유지한다. 즉, 상기 휴대용 단말기는 치환 여부를 나타내는 화면을 표시하고, 상기 사용자의 조작에 따라 상기 적어도 하나의 개념어의 전부 또는 일부를 치환하거나, 또는, 상기 적어도 하나의 개념어를 유지한다.On the other hand, if the at least one concept word to be substituted is included, the portable terminal proceeds to step 715 and inquires whether the at least one concept word is replaced, and replaces or maintains the at least one concept word according to a user's operation. . That is, the portable terminal displays a screen indicating whether to replace or replaces all or part of the at least one conceptual word or maintains the at least one conceptual word according to the user's manipulation.

이후, 상기 휴대용 단말기는 717단계로 진행하여 확정된 모국어 문장에 대응되는 외국어 문장을 번역 결과로서 제공한다. 다시 말해, 상기 휴대용 단말기는 상기 확정된 모국어 문장의 번역 문장으로서 저장된 외국어 문장을 화면에 표시한다.
In operation 717, the portable terminal provides a foreign language sentence corresponding to the determined native language sentence as a translation result. In other words, the portable terminal displays a foreign language sentence stored as a translation sentence of the determined native language sentence on the screen.

본 발명은 상기 도 7을 참고하여 모국어 문장을 입력받고, 외국어 문장을 제공하는 실시 예를 설명하였다. 하지만, 본 발명의 다른 실시 예에 따라, 휴대용 단말기는 상기 모국어 문장이 아닌 키워드를 입력받을 수 있다. 이 경우, 상기 휴대용 단말기는 상기 키워드를 포함하는 적어도 하나의 후보 문장을 제시함으로써 사용자가 외국어로의 번역을 원하는 문장을 선택하도록 할 수 있다.
The present invention has been described with reference to FIG. 7 in which a native language sentence is input and a foreign language sentence is provided. However, according to another embodiment of the present invention, the portable terminal may receive a keyword other than the native sentence. In this case, the portable terminal may allow the user to select a sentence to be translated into a foreign language by presenting at least one candidate sentence including the keyword.

한편 본 발명의 상세한 설명에서는 구체적인 실시 예에 관해 설명하였으나, 본 발명의 범위에서 벗어나지 않는 한도 내에서 여러 가지 변형이 가능함은 물론이다. 그러므로 본 발명의 범위는 설명된 실시 예에 국한되어 정해져서는 아니 되며 후술하는 특허청구의 범위뿐만 아니라 이 특허청구의 범위와 균등한 것들에 의해 정해져야 한다.Meanwhile, in the detailed description of the present invention, specific embodiments have been described, but various modifications are possible without departing from the scope of the present invention. Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined not only by the scope of the following claims, but also by the equivalents of the claims.

Claims

A translation service providing method for translating a first language into a second language in a portable terminal,
Extracting at least one concept word from the input sentence when an input sentence of the first language is input;
Selecting and presenting at least one similar sentence among candidate sentences stored in the construction data;
If the similar sentence selected by the user among the at least one similar sentence includes a substitution target concept word, displaying a screen for inquiring whether to substitute the conceptual word for substitution;
Substituting the concept word to be substituted into the concept word included in the input sentence;
Displaying a translation sentence in which a sentence including a substituted concept word is translated into the second language,
The substitute target conceptual word is a conceptual word included in the selected similar sentence and belonging to the same concept as the conceptual word included in the input sentence, but representing a different object.

The method of claim 1,
Extracting the at least one concept word,
Extracting the conceptual words registered in the conceptual band dictionary;
Extracting the unregistered conceptual word according to the unregistered conceptual word extraction rule;
Tagging the extracted at least one conceptual word.

The method of claim 2,
The process of extracting the unregistered conceptual word according to the unregistered conceptual word extraction rule may include:
Determining a concept of the noun using at least one of a verb and a verb combined with a noun.

The method of claim 1,
When the input sentence is input, analyzing a morpheme structure of the input sentence;
And determining a sentence type of the input sentence by using the analysis result of the morpheme composition.

The method of claim 4, wherein
The process of determining the sentence type of the input sentence,
And comparing the analysis result of the morpheme construct with a morpheme construct of a predefined regular expression.

The method of claim 4, wherein
The process of selecting the at least one similar sentence,
Searching for candidate sentences including some or all of keywords of the input sentences among candidate sentences belonging to a sentence type of the input sentences;
Identifying a cluster distribution of candidate sentences including some or all of the keywords, and selecting the at least one similar sentence within a cluster including the largest number of sentences.

The method of claim 6,
Wherein the at least one similar sentence is selected using the probabilities of generating the input sentence in each candidate sentence.

The method of claim 7, wherein
The probabilities of generating the input sentence in each candidate sentence are determined by the following equation,

Where

Is similarity, said

Is a candidate sentence, said

Is the input sentence, said

Is the i th item constituting the input sentence,

Is the number of items constituting the input sentence,

And

Is a weight,

Above

A set of sentences having the same meaning as

Means the complete set of sentences,

Where

Is a candidate sentence, said

Is the i th item constituting the input sentence,

Means the number of items that make up the input sentence.

Where

And

Weights for determining a probability value, said

Is a candidate sentence, said

Above

Means a set of sentences with the same meaning as

The method of claim 1,
And reinforcing the construction data by storing the input sentence.

The method of claim 1,
The construction data may include sentence information, mapping relationship information between a first language sentence and a second language sentence, category classification information of sentences, conceptual word information for sentence comparison, sentence type information, regular expression information for each sentence type, and sentence search. And at least one of index information for the cluster information and cluster information.

A portable terminal apparatus for providing a translation service for translating a first language into a second language,
A controller for extracting at least one concept word from the input sentence and selecting and presenting at least one similar sentence among candidate sentences stored in construction data when an input sentence of the first language is input;
A display unit configured to display a screen for inquiring whether to substitute the conceptual target word when the similar target sentence selected by the user is included in the similar sentence selected by the user among the at least one similar sentence;
The control unit substitutes a conceptual word included in the input sentence with the substitution target concept word,
The display unit displays a translation sentence in which a sentence including a substituted concept word is translated into the second language,
And the conceptual object to be substituted is a conceptual word belonging to the same concept as the conceptual word included in the selected similar sentence and representing a different object.

The method of claim 11,
The controller extracts a conceptual word registered in a conceptual word band dictionary, extracts an unregistered conceptual word according to an unregistered conceptual word extraction rule, and then tags the extracted at least one conceptual word.

The method of claim 12,
The controller may determine the concept of the noun using at least one of a verb and a verb combined with a noun.

The method of claim 11,
The controller may be configured to, when the input sentence is input, analyze a morpheme structure of the input sentence and then determine a sentence type of the input sentence using an analysis result of the morpheme structure.

The method of claim 14,
And the controller determines the sentence type by comparing the analysis result of the morpheme structure with a morpheme structure of a predefined regular expression.

The method of claim 14,
The controller may search for candidate sentences including some or all of keywords of the input sentence among candidate sentences belonging to a sentence type of the input sentence, identify a cluster distribution of candidate sentences including some or all of the keywords, and And selecting the at least one similar sentence within a cluster comprising the largest number of sentences.

The method of claim 16,
And the at least one similar sentence is selected using probabilities of generating the input sentence in each candidate sentence.

The method of claim 17,
The probability that the input sentence is generated in each candidate sentence, characterized in that determined by the following equation,

Where

Is similarity, said

Is a candidate sentence, said

Is the input sentence, said

Is the i th item constituting the input sentence,

Is the number of items constituting the input sentence,

And

Is a weight,

Above

A set of sentences having the same meaning as

Means the complete set of sentences,

Where

Is a candidate sentence, said

Is the i th item constituting the input sentence,

Means the number of items that make up the input sentence.

Where

And

Weights for determining a probability value, said

Is a candidate sentence, said

Above

Means a set of sentences with the same meaning as

The method of claim 11,
And the control unit reinforces the construction data by storing the input sentence.

The method of claim 11,
The construction data may include sentence information, mapping relationship information between a first language sentence and a second language sentence, category classification information of sentences, conceptual word information for sentence comparison, sentence type information, regular expression information for each sentence type, and sentence search. And at least one of index information for the cluster information and cluster information.