KR101450795B1

KR101450795B1 - Apparatus and method for anaphora resolution

Info

Publication number: KR101450795B1
Application number: KR1020130083799A
Authority: KR
Inventors: 송도규
Original assignee: (주)센솔로지
Priority date: 2013-07-16
Filing date: 2013-07-16
Publication date: 2014-10-16

Abstract

The present invention relates to a resource description framework (RDF) triple-based anaphora resolution apparatus which comprises: an RDF triple conversion part which converts each sentence included in input text into RDP triples; and an anaphora processing part which finds anaphora triples including an anaphor from a plulrality of triples converted in the RDF triple conversion part, and specifies an antecedent of the anaphor by comparing the anaphora triples with the antecedent triples of the anaphora triples.

Description

[0001] APPARATUS AND METHOD FOR ANAPHORA RESOLUTION [0002]

본 발명은 대용어 복원 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for restoring large terms.

최근 언어를 컴퓨터가 이해할 수 있는 포맷인 RDF(Resource Description Framework) 트리플로 변환하여 지식과 정보의 의미를 전달하는 기술이 연구되고 있다. RDF 트리플은 월드 와이드 웹 컨소시엄(World Wide Web Consortium, W3C)이 관장하는 국제 표준으로서, 지식과 정보를 서브젝트[Subject(resource)], 프레디키트[Predicate(property)], 오브젝트[(Object(literal)]의 세 쌍으로 나타내는 형식이다.Recently, a technology for translating the meaning of knowledge and information into RDF (Resource Description Framework) triple, which is a computer understandable format, has been studied. The RDF Triple is an international standard administered by the World Wide Web Consortium (W3C) that provides knowledge and information to subjects [resource], predicate [property], object [ ].

대용어(anaphor)는 독자적인 지시를 가질 수 없으며, 문장 내의 다른 표현, 즉 선행어에 의해 특정된다. 대용어는 자연언어에서 필수불가결한 요소이며 언어 경제성의 근간을 이룬다. 사람들이 대용어가 참조하는 선행어를 정확히 특정하기 위해서 상당한 독해 훈련이 필요하고, 언어적 지식 이외에 언어 외적인 지식도 필요하다. 따라서, 컴퓨터가 대용어가 참조하는 선행어를 찾아내기 어렵다.Anaphor can not have its own direction, but is specified by another expression in the sentence, that is, the preceding word. Vocabulary is an indispensable element in natural language and forms the basis of language economics. In order for people to accurately identify the antecedents that the term refers to, they need a lot of reading and training, and they also need extra-linguistic knowledge in addition to their linguistic knowledge. Therefore, it is difficult for the computer to find the leading term that the term refers to.

지금까지의 자연언어 자동처리 장치들은 대용어와 선행어의 성, 수, 인칭과 같은 문법자질, 그리고 사람, 사물, 장소, 음식, 조직 등의 의미자질 등의 부합 정도를 기초로 대용어에 해당하는 선행어를 특정하였다. 그러나 문법자질이나 의미자질이 비슷한 선행어 후보가 여러 개 존재하는 경우나, 영형대용어인 경우, 대용어에 해당하는 선행어를 특정하기 어렵다. 따라서 RDF 트리플로 표현된 문맥 정보를 활용하여 컴퓨터가 대용어에 해당하는 선행어를 정확히 특정하는 방법이 필요하다.Until now, natural language automatic processing devices have been developed to deal with vocabularies based on the degree of correspondence between the vocabulary and the grammatical qualities such as sex, number, and personality of the leading words, and the meaning qualities of people, objects, places, We specified the leading word. However, it is difficult to specify the leading term corresponding to a large term when there are several candidate leading words having similar grammatical qualities or semantic qualities, Therefore, there is a need for a method of accurately identifying the leading term corresponding to a large term using the context information represented by the RDF triple.

대한민국등록특허 10-1092355 대용어 복원 방법(2011년 12월 09일 공고)Korea Patent No. 10-1092355 Method for restoring the term (Dec 09, 2011 notice)

본 발명이 해결하고자 하는 과제는 문맥 기반 대용어 복원 장치 및 방법을 제공하는 것이다.SUMMARY OF THE INVENTION It is an object of the present invention to provide an apparatus and method for restoring a context-based vocabulary.

본 발명의 한 실시예에 따른 리소스 디스크립션 프레임워크(Resource Description Framework, RDF) 트리플 기반의 대용어 복원 장치로서, 입력문에 포함된 각 문장을 RDF 트리플로 변환하는 RDF 트리플 변환부, 상기 RDF 트리플 변환부에서 변환된 복수의 트리플 중에서, 대용어를 포함하는 대용어 트리플을 찾고, 상기 대용어 트리플과 상기 대용어 트리플의 선행 트리플들을 비교하여 상기 대용어의 선행어를 특정하는 대용어 처리부를 포함한다.A resource description framework (RDF) triple-based large term restoration apparatus according to an embodiment of the present invention includes an RDF triple transformation unit for transforming each sentence included in an input statement into an RDF triple, And a vocabulary processing unit for searching for a vocabulary triple including a vocabulary among a plurality of triples converted in the vocabulary unit and comparing the vocabulary triple and the preceding triples of the vocabulary triple to identify a preceding word of the vocabulary.

상기 대용어 처리부는 상기 복수의 트리플 중에서, 서브젝트와 오브젝트 자리에 대용어가 있는 트리플을 대용어 트리플로 추출할 수 있다.The to-be-term processing unit may extract a triple having a term in a place of an object and an object in a triple term among the plurality of triples.

상기 대용어 처리부는 지시어를 포함하는 어휘, 보통명사, 대명사, 그리고 RDF 트리플의 서브젝트나 오브젝트가 비어 있음을 표시하는 영형대용어 표시 정보를 대용어로 판단할 수 있다.The large term processing unit can determine the vocabulary including the directive, the normal noun, the pronoun, and the Young's term display information indicating that the subject or object of the RDF triple is empty.

상기 대용어 처리부는 선행 트리플들 중에서 상기 대용어 트리플과 동일한 어휘 또는 동일한 자질을 가진 관련 선행 트리플을 찾고, 상기 관련 선행 트리플에서 대용어의 선행어를 특정할 수 있다.The to-term processing unit may find an associated preceding triple having the same vocabulary or the same qualities as the corresponding vocabulary triple among the preceding triples and specify the leading term of the vocabulary in the related preceding triple.

상기 대용어 처리부는 상기 대용어가 지시어와 명사를 포함하는 형태인 경우, 선행 트리플들 중에서 상기 명사를 포함하는 서브젝트 또는 오브젝트를 찾고, 찾은 서브젝트 또는 오브젝트에 기재된 어휘를 선행어로 특정할 수 있다.The to-term processing unit may search for a subject or object including the noun among preceding triples and specify a vocabulary described in the found subject or object as a preceding word when the corresponding term is a form including an instruction and a noun.

상기 대용어 처리부는 선행 트리플들 중에서 상기 명사를 포함하는 서브젝트 또는 오브젝트를 찾지 못한 경우, 상기 명사와 자질이 동일한 어휘를 포함하는 서브젝트 또는 오브젝트를 찾을 수 있다.If the subject or the object including the noun is not found among the preceding triples, the term terminology processing unit can find a subject or an object that includes a vocabulary having the same qualities as the noun.

상기 대용어 처리부는 상기 대용어가 서브젝트 자리에 위치한 경우, 선행 트리플들과 상기 대용어 트리플의 오브젝트와 프레디키트를 비교하여 상기 관련 선행 트리플을 찾고, 상기 관련 선행 트리플의 서브젝트를 선행어로 특정할 수 있다.In the case where the large term is located at the subject position, the large term processing unit may compare the object of the preceding triples and the large term triple with the predicated kit to find the related preceding triple and specify the subject of the related preceding triple as a leading character have.

상기 대용어 처리부는 상기 대용어가 오브젝트 자리에 위치한 경우, 선행 트리플들과 상기 대용어 트리플의 서브젝트와 프레디키트를 비교하여 상기 관련 선행 트리플을 찾고, 상기 관련 선행 트리플의 오브젝트를 선행어로 특정할 수 있다.When the large term is located at the object place, the large term processing unit compares the predecessor triple with the subject of the large term triple and the predicate kit to find the related preceding triple and specify the object of the related preceding triple as a leading character have.

상기 대용어 처리부는 상기 대용어를 선행어로 치환하여 상기 대용어 트리플을 복원하고, 복원된 대용어 트리플을 저장할 수 있다.The vocabulary processing unit may restore the vocabulary triple by replacing the vocabulary with a leading word and store the restored vocabulary triple.

상기 대용어 처리부는 상기 RDF 트리플 변환부에서 변환된 복수의 트리플 중에서, 대용어를 포함하는 대용어 트리플을 추출하는 대용어 추출부, 상기 대용어 트리플과 상기 대용어 트리플의 선행 트리플들을 비교하여 상기 대용어의 선행어를 특정하는 선행어 특정부, 그리고 상기 대용어를 선행어로 치환하여 상기 대용어 트리플을 복원하는 RDF 트리플 복원부를 포함할 수 있다.Wherein the large term processing unit comprises a large term extraction unit for extracting a large term term triple including a large term from a plurality of triples converted by the RDF triple conversion unit, And a RDF triple restoring unit for restoring the large term triple by replacing the large term with a leading character.

본 발명의 다른 실시예에 따른 대용어 복원 장치가 리소스 디스크립션 프레임워크(Resource Description Framework, RDF) 트리플 기반으로 대용어를 복원하는 방법으로서, 입력문의 각 문장에 대응하는 복수의 RDF 트리플을 입력받는 단계, 입력받은 복수의 트리플 중에서, 대용어를 포함하는 대용어 트리플을 찾는 단계, 상기 대용어 트리플의 선행 트리플들 중에서 상기 대용어 트리플과 동일한 어휘 또는 자질을 가진 관련 선행 트리플을 찾는 단계, 그리고 상기 관련 선행 트리플에서 상기 대용어의 선행어를 특정하는 단계를 포함한다.A method for restoring a large term based on a Resource Description Framework (RDF) triple based on a plurality of RDF triples corresponding to each sentence of an input query Searching for a corresponding term triple having the same vocabulary or qualities as the corresponding term triple among the preceding triples of the same term triple among the inputted plurality of triples, And identifying the preceding term of the vocabulary in the preceding triple.

상기 대용어 트리플을 찾는 단계는 임의 트리플의 서브젝트와 오브젝트 중 적어도 하나의 자리에, 지시어를 포함하는 어휘, 보통명사, 대명사 그리고 영형대용어 표시 정보 중 적어도 하나가 기재된 경우, 상기 임의 트리플을 대용어 트리플로 판단하고, 상기 영형대용어 표시 정보는 해당 트리플의 서브젝트나 오브젝트가 비어 있음을 표시하는 정보일 수 있다.The step of searching for the large term triple may include the step of, when at least one of a vocabulary including a directive, a common noun, a pronoun, And the Young's Daughter term display information may be information indicating that the subject or object of the corresponding triple is empty.

상기 관련 선행 트리플을 찾는 단계는 상기 대용어가 지시어와 명사를 포함하는 형태인 경우, 선행 트리플들 중에서 상기 명사를 포함하는 서브젝트 또는 오브젝트를 찾고, 상기 선행어를 추출하는 단계는 찾은 서브젝트 또는 오브젝트에 기재된 어휘를 선행어로 추출할 수 있다.Wherein the step of finding the related preceding triple is to search for a subject or object that includes the noun in the preceding triples if the corresponding term is a form including an directive and a noun, Vocabulary can be extracted as a leading word.

상기 관련 선행 트리플을 찾는 단계는 선행 트리플들 중에서 상기 명사를 포함하는 서브젝트 또는 오브젝트를 찾지 못한 경우, 상기 명사와 자질이 동일한 어휘를 포함하는 서브젝트 또는 오브젝트를 찾을 수 있다.In the step of finding the related preceding triple, if the subject or object including the noun is not found among the preceding triples, a subject or object including a vocabulary having the same qualities as the noun can be found.

상기 관련 선행 트리플을 찾는 단계는 상기 대용어가 서브젝트 자리에 위치한 경우, 선행 트리플들과 상기 대용어 트리플의 오브젝트와 프레디키트를 비교하여 상기 대용어 트리플과 동일한 어휘 또는 자질을 가진 관련 선행 트리플을 찾을 수 있다.In the step of searching for the related preceding triple, when the corresponding term is located at the subject position, the object of the preceding triple and the corresponding term triple is compared with the predicated kit to find an associated preceding triple having the same vocabulary or qualities as the corresponding term triple .

상기 선행어를 특정하는 단계는 상기 관련 선행 트리플의 서브젝트를 선행어로 특정할 수 있다.The step of specifying the preceding word may specify a subject of the related preceding triple as a leading word.

상기 관련 선행 트리플을 찾는 단계는 상기 대용어가 오브젝트 자리에 위치한 경우, 선행 트리플들과 상기 대용어 트리플의 서브젝트와 프레디키트를 비교하여 상기 대용어 트리플과 동일한 어휘 또는 자질을 가진 관련 선행 트리플을 찾을 수 있다.The finding of the related preceding triple may include comparing the subject of the preceding triple with the subject of the corresponding term triple and the predicated kit to find an associated preceding triple having the same vocabulary or qualities as the corresponding term triple .

상기 선행어를 특정하는 단계는 상기 관련 선행 트리플의 오브젝트를 선행어로 특정할 수 있다.The step of identifying the preceding word may specify an object of the related preceding triple as a leading word.

상기 대용어 복원 방법은 상기 대용어를 선행어로 치환하여 상기 대용어 트리플을 복원하는 단계, 그리고 복원된 대용어 트리플을 저장하는 단계를 더 포함할 수 있다.The method of restoring a large term may further include restoring the large term triple by replacing the large term with a leading term, and storing the restored large term triple.

본 발명의 실시예에 따르면 문장의 의미와 문맥 정보를 RDF 트리플로 표현하므로, 개별적인 문법자질과 의미자질의 부합 정도를 기초로 선행어를 판단하는 방법에 비해 선행어를 정확히 특정할 수 있다. According to the embodiment of the present invention, the meaning of the sentence and the context information are represented by the RDF triple, so that the leading word can be accurately specified as compared with the method of determining the leading word based on the degree of matching of the individual grammatical qualities and the meaning qualities.

본 발명의 실시예에 따르면 성, 수, 인칭과 같은 문법자질 그리고 사람, 사물, 장소, 음식, 조직 등의 의미자질 등이 비슷한 선행어 후보가 나열되어 있더라도 대용어에 해당하는 선행어를 정확히 특정할 수 있다. According to the embodiment of the present invention, it is possible to precisely specify a leading term corresponding to a large term even if the candidate of the leading term having similar grammatical qualities such as sex, number, and personality and the semantic qualities of people, objects, places, have.

또한, 본 발명의 실시예에 따르면 영형대용어에 해당하는 선행어를 정확히 특정할 수 있다. 따라서, 본 발명의 실시예에 따르면 자연언어 자동처리 장치들에 더 좋은 성능을 기대할 수 있다. Also, according to the embodiment of the present invention, it is possible to precisely specify the leading word corresponding to the Young-Dang term. Therefore, according to the embodiment of the present invention, a better performance can be expected for automatic language processing apparatuses.

도 1은 본 발명의 한 실시예에 따른 대용어 복원 장치의 블록도이다.
도 2는 본 발명의 한 실시예에 따른 대용어 처리부의 블록도이다.
도 3은 본 발명의 한 실시예에 따른 대용어 복원 방법의 흐름도이다.
도 4는 본 발명의 다른 실시예에 따른 대용어 복원 방법의 흐름도이다.1 is a block diagram of a large term recovery apparatus according to an embodiment of the present invention.
2 is a block diagram of a vocabulary processing unit according to an embodiment of the present invention.
3 is a flowchart of a method for restoring a large term according to an embodiment of the present invention.
4 is a flowchart of a method for restoring a large term according to another embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when an element is referred to as "comprising ", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise.

이제 도면을 참고하여 본 발명의 실시예에 따른 대용어 복원 장치 및 방법에 대해 설명한다.Hereinafter, an apparatus and method for restoring large terms according to an embodiment of the present invention will be described with reference to the drawings.

도 1은 본 발명의 한 실시예에 따른 대용어 복원 장치의 블록도이다.1 is a block diagram of a large term recovery apparatus according to an embodiment of the present invention.

도 1을 참고하면, 대용어 복원 장치(100)는 텍스트를 구성하는 복수의 문장을 리소스 디스크립션 프레임워크(Resource Description Framework, RDF) 트리플로 변환한다. 대용어 복원 장치(100)는 트리플에 포함된 대용어(anaphor)를 추출하고, 선행 트리플 중에서 대용어에 해당하는 선행어를 특정한다. Referring to FIG. 1, a large term restoration apparatus 100 converts a plurality of texts constituting text into a Resource Description Framework (RDF) triple. The term restoring apparatus 100 extracts an anchor included in the triple and identifies a leading term corresponding to a larger term in the preceding triple.

RDF 트리플은 월드 와이드 웹 컨소시엄(World Wide Web Consortium, W3C)이 관장하는 국제 표준으로서, 지식과 정보를 서브젝트[Subject(resource)], 프레디키트[Predicate(property)], 오브젝트[(Object(literal)]의 세 쌍으로 나타내는 형식이다.The RDF Triple is an international standard administered by the World Wide Web Consortium (W3C) that provides knowledge and information to subjects [resource], predicate [property], object [ ].

대용어는 독자적인 지시를 가질 수 없으며, 문장 내의 다른 표현, 즉 선행어에 의해 특정된다. 예를 들면, 지시어, 지시어+명사, 대명사, 보통명사 등이 대용어일 수 있다. 또는 변환된 RDF 트리플의 서브젝트와 오브젝트 자리에 "?x"나 "?y" 등의 변수가 나타나는 경우, 비어있는 서브젝트와 오브젝트가 영형대용어일 수 있다.A vocabulary can not have its own direction, but is specified by another expression in the sentence, that is, a preceding word. For example, directives, directives + nouns, pronouns, common nouns, and the like can be major terms. Or if a variable such as "? X" or "? Y" appears in place of the subject of the transformed RDF triple and the object, then the empty subject and object may be a zero term term.

대용어 복원 장치(100)는 입력문 분석부(200), RDF 트리플 변환부(300), 대용어 처리부(400), 그리고 RDF 트리플 리파지토리(500)를 포함한다.The term terminology recovery apparatus 100 includes an input analysis unit 200, an RDF triple conversion unit 300, a term processing unit 400, and an RDF triple repository 500.

입력문 분석부(200)는 입력부(210), 형태소 분석부(230), 어절 생성부(250) 그리고 문장성분 분석부(270)를 포함한다.The input analysis unit 200 includes an input unit 210, a morphological analysis unit 230, a word generator 250, and a sentence component analysis unit 270.

입력부(210)는 텍스트로 구성된 입력문을 입력받는다. 입력문은 복수의 문장으로 구성될 수 있다. 입력부(210)는 텍스트 파일, 웹 문서 등 텍스트로 구성된 입력문을 입력받을 수 있다.The input unit 210 receives input text composed of text. The input statement may be composed of a plurality of sentences. The input unit 210 can receive input text composed of text such as a text file and a web document.

형태소 분석부(230)는 입력문을 형태소로 분석한다. 형태소 분석부(230)는 입력문을 형태소 분석기를 이용하여 형태소로 분석한다. 형태소는 문장을 구성하는 요소 중 의미를 가진 가장 작은 단위이다.The morpheme analysis unit 230 analyzes the input sentence as a morpheme. The morpheme analysis unit 230 analyzes the input sentence using a morpheme analyzer. A morpheme is the smallest unit of meaning that makes up a sentence.

어절 생성부(250)는 형태소를 기초로 어절을 생성한다. 어절은 맞춤법에 맞게 쓰여진 문장에서 공백으로 구분되는 문장 구성 요소이다. 어절은 품사적 성격에 따라 체언(NN), 용언(VV), 긍정 지정사(VNP), 관형사(MM), 부사(MA), 감탄사(IC), 접속사(CONJ)로 구분된다.The phrase generator 250 generates a word phrase based on the morpheme. A phrase is a sentence component that is separated by a space in a spelling sentence. The vernacular is divided into cognition (NN), vernacular (VV), affixed verb (VNP), customary (MM), adverb (MA), exclamation (IC), and connective (CONJ)

문장성분 분석부(270)는 어절의 문장 내에서의 역할, 즉 문장성분을 분석한다. 문장 성분은 주어(SBJ), 목적어(OBJ), 서술어(PRD), 보어(CMP), 수식어(MOD), 부가어(AJT), 접속어(CNJ), 독립어(INT)로 구분된다.The sentence component analysis unit 270 analyzes the role in the sentence of the phrase, that is, the sentence component. Sentence components are divided into subject (SBJ), object (OBJ), predicate (PRD), bore (CMP), modifier (MOD), additional word (AJT), connection word (CNJ) and independent word (INT).

RDF 트리플 변환부(300)는 입력문 분석부(200)에서 출력된 문장성분을 기초로 각 문장을 문장 블록으로 나누어 문장 분절 정보를 생성한다. 문장 블록은 체언 블록(N), 복합 명사 블록(N), 고유 명사 블록(P), 단위 명사 블록(U), 속격 블록(G), 대등 연결 블록(O), 용언 블록(V), 관형어 블록(C), 부사어 블록(B), 절 블록(S), 의문 블록(Q)을 포함한다.The RDF triple conversion unit 300 generates sentence segment information by dividing each sentence into sentence blocks based on the sentence components output from the input sentence analysis unit 200. [ The sentence block is composed of a chan- nel block N, a compound noun block N, a proper noun block P, a unit noun block U, a grammatical block G, a par- ticular connection block O, a verb block V, Block C, adverbial block B, clause block S, and question block Q. [

RDF 트리플 변환부(300)는 입력문 분석부(200)에서 출력된 문장성분과 문장 분절 정보를 기초로 각 문장을 RDF 트리플로 변환한다. RDF 트리플은 서브젝트, 프레디키트, 오브젝트로 구성된다.The RDF triple conversion unit 300 converts each sentence into an RDF triple based on the sentence component and sentence segment information output from the input sentence analysis unit 200. An RDF triple consists of a subject, a predicate kit, and an object.

대용어 처리부(400)는 RDF 트리플에 대용어가 있으면, 그 의미를 복원하기 위해 대용어에 해당하는 선행어를 특정한다. If there is a term in the RDF triple, the term terminology processing unit 400 specifies a leading term corresponding to the term in order to restore its meaning.

대용어 추출부(410)는 RDF 트리플 변환부(300)에서 변환된 트리플들 중에서 대용어를 포함하는 대용어 트리플을 찾는다. 대용어 처리부(400)는 RDF 트리플 변환부(300)에서 변환된 트리플들 중에서 대용어를 포함하지 않는 트리플을 RDF 트리플 리파지토리(500)에 저장한다. The term term extraction unit 410 searches for a large term triple including a large term among the triples converted by the RDF triple conversion unit 300. The term terminology processing unit 400 stores in the RDF triple repository 500 a triple that does not include a major term among the triples converted by the RDF triple conversion unit 300.

대용어 처리부(400)는 대용어를 포함하는 대용어 트리플과 RDF 트리플 리파지토리(500)에 저장된 선행 트리플들을 비교한다. 대용어 처리부(400)는 선행어 특정 조건을 기초로 대용어에 해당하는 선행어를 특정한다. 선행어 특정 조건은 트리플 간의 유사성을 판단하는 조건으로서, 대용어의 형태, 대용어에 포함된 어휘, 대용어에 포함된 어휘의 자질, 트리플에 포함된 어휘, 트리플에 포함된 어휘의 자질 등을 비교하는 조건을 포함한다. 예를 들어, 대용어 처리부(400)는 대용어에 포함된 어휘를 포함하는 선행 트리플을 찾을 수 있다. 또는 대용어 처리부(400)는 대용어 트리플의 서브젝트, 프레디키트, 오브젝트와 동일한 어휘/자질을 포함하는 선행 트리플을 찾는다. 여기서 대용어 트리플은 서브젝트 또는 오브젝트에 대용어를 포함하는 RDF 트리플을 지칭한다. 자질은 성, 수, 인칭과 같은 문법자질, 그리고 사람, 사물, 장소, 음식, 조직 등의 의미자질 등 어휘의 다양한 특징을 나타내는 정보를 포함한다.The term terminology processing unit 400 compares the preceding triple stored in the RDF triple repository 500 with the term triple containing the upper term. The vocabulary processing unit 400 specifies a leading word corresponding to a vocabulary based on a leading-word specifying condition. A predicate-specific condition is a condition for judging the similarity between triples. The type of a large term, the vocabulary contained in a large term, the qualities of a vocabulary contained in a large term, the vocabulary included in a triple, and the qualities of a vocabulary included in a triple are compared . For example, the term terminology processing unit 400 may search for a preceding triple that includes a vocabulary contained in a large term. Or vocabulary processing unit 400 finds a preceding triple that contains the same vocabulary / qualities as the subject of the large term triple, the predicate kit, and the object. Where the vocabulary triple refers to the RDF triple that contains terms relative to the subject or object. The qualities include information indicating various characteristics of the vocabulary such as gender qualities such as sex, number, personality, and semantic qualities of people, objects, places, foods, organizations and the like.

대용어 처리부(400)는 대용어를 선행어로 치환하여 복원된 트리플을 생성한다. RDF 트리플 복원부(450)는 복원된 트리플을 RDF 트리플 리파지토리(500)에 저장한다.The vocabulary processing unit 400 generates the restored triple by replacing the vocabulary with the leading word. The RDF triple restoring unit 450 stores the restored triples in the RDF triple repository 500.

도 2는 본 발명의 한 실시예에 따른 대용어 처리부의 블록도이다.2 is a block diagram of a vocabulary processing unit according to an embodiment of the present invention.

도 2를 참고하면, 대용어 처리부(400)는 대용어 추출부(410), 선행어 특정부(430), 그리고 RDF 트리플 복원부(450)를 포함한다.2, the vocabulary processing unit 400 includes a vocabulary extracting unit 410, a leading character identifying unit 430, and an RDF triple restoring unit 450.

대용어 추출부(410)는 서브젝트와 오브젝트 자리에 대용어가 있는지 판단한다. 대용어 추출부(410)는 서브젝트와 오브젝트 자리에 위치한 지시어, 지시어+명사, 보통명사, 영형대용어 표시 정보를 대용어로 추출한다. 여기서, 영형대용어 표시 정보는 해당 자리가 비어 있음을 나타내는 정보로서, 예를 들면, "?x"나 "?y" 등의 변수일 수 있다. The term extraction unit 410 determines whether a term exists in the subject and the object position. The term extraction unit 410 extracts a directive, a directive + noun, a common noun, and a young unit term display information, which are located in the subject and the object position, in large terms. Herein, the Young-Daejun term display information is information indicating that the corresponding place is empty, and may be a variable such as "? X" or "? Y ".

대용어 추출부(410)는 RDF 트리플 변환부(300)에서 변환된 트리플들 중에서 대용어를 포함하지 않는 트리플을 RDF 트리플 리파지토리(500)에 저장한다. The term extraction unit 410 stores in the RDF triple repository 500 a triple that does not include a major term among the triples converted by the RDF triple conversion unit 300.

선행어 특정부(430)는 대용어를 포함하는 대용어 트리플과 RDF 트리플 리파지토리(500)에 저장된 선행 트리플들을 비교한다. 선행 트리플은 대용어 트리플의 문장보다 앞선 문장의 트리플로서, 선행어를 포함할 가능성이 있는 트리플이다.The leading-word specifying unit 430 compares the preceding triple stored in the RDF triple repository 500 with the corresponding term triple including the large term. A leading triple is a triple of sentences preceded by a sentence of a major term triple, possibly a triple containing a leading term.

선행어 특정부(430)는 선행어 특정 조건을 기초로 대용어 트리플과 동일한 어휘/자질을 가진 선행 트리플을 찾는다. 그리고 선행어 특정부(430)는 찾은 선행 트리플에서, 대용어에 대응하는 선행어를 특정한다. 선행어 특정부(430)는 대용어의 형태, 대용어에 포함된 어휘, 트리플에 포함된 어휘의 자질, 트리플 간의 유사성 등에 따라 선행어 특정 조건을 적용할 수 있다.The leading-word specifying unit 430 finds a leading triple having the same vocabulary / qualities as the large-term triple based on the leading-word-specific condition. The leading-line specifying unit 430 specifies a leading term corresponding to the large term in the found leading triple. The leading-word specifying unit 430 can apply the leading-line-specific condition according to the type of the large term, the vocabulary included in the large term, the qualities of the vocabulary included in the triple, the similarity of triples, and the like.

RDF 트리플 복원부(450)는 대용어를 선행어로 치환하여 복원된 트리플을 생성한다. RDF 트리플 복원부(450)는 복원된 트리플을 RDF 트리플 리파지토리(500)에 저장한다. The RDF triple restoring unit 450 generates a restored triple by replacing a large term with a leading word. The RDF triple restoring unit 450 stores the restored triples in the RDF triple repository 500.

다음에서, 대용어 처리부(400)의 동작에 대해 예를 들어 설명한다.Next, the operation of the term terminology processing unit 400 will be described by way of example.

제1실시예는 다음과 같다. The first embodiment is as follows.

RDF 트리플 변환부(300)가 "센솔로지는 SNS 대쉬보드 기술을 개발하였다. 이 기술은 활용가치가 크다."는 문장 각각을 표 1과 같이 RDF 트리플로 변환한다. The RDF triple conversion unit 300 converts each of the sentences into an RDF triple as shown in Table 1 by using the SNS dashboard technology developed by SensoLo.

서브젝트Subject 프레디키트Freddie Kit 오브젝트Object 문장1Sentence 1 센솔로지Sen Soloji 개발Development SNS 대쉬보드 기술SNS Dashboard Technology 문장2Sentence 2 이 기술This technology 크다Big 활용가치Value of use

대용어 추출부(410)는 문장2의 트리플에서 "지시어+명사" 형태의 "이 기술"을 대용어로 추출한다. The term extraction unit 410 extracts "this technique" in the form of "directive + noun "

선행어 특정부(430)는 선행어 특정 조건을 기초로 대용어에 해당하는 선행어를 특정한다. 선행어 특정부(430)는 대용어가 "지시어+명사" 형태인 경우, 선행 트리플들에서 대용어에 포함된 "명사" 어휘를 형태적으로 포함하는 대상을 찾는다. 선행어 특정부(430)는 "명사" 어휘를 형태적으로 포함하는 대상을 선행어로 특정한다. 즉, 선행어 특정부(430)는 "이 기술"에 포함된 "기술"이라는 어휘를 형태적으로 포함하는 문장1의 오브젝트 "SNS 대쉬보드 기술"을 선행어로 특정한다. The leading-line specifying unit 430 specifies a leading term corresponding to a large term based on a leading-end specifying condition. The leading-word specifying unit 430 finds an object that morphologically includes the "noun" vocabulary included in the vocabulary in the preceding triples when the vocabulary is in the form of "directive + noun ". The leading-line specifying unit 430 specifies an object that morphologically includes a "noun" vocabulary as a leading line. That is, the leading-word specifying unit 430 specifies the object "SNS dashboard technique" of sentence 1 that includes the vocabulary "technique" included in "this technique" as a leading word.

RDF 트리플 복원부(450)는 문장2의 트리플을 표 2와 같이 복원한다.The RDF triple restoring unit 450 restores the triple of sentence 2 as shown in Table 2.

서브젝트Subject 프레디키트Freddie Kit 오브젝트Object 문장2Sentence 2 이 기술This technology 크다Big 활용가치Value of use 복원된 문장2Restored sentence 2 SNS 대쉬보드 기술SNS Dashboard Technology 크다Big 활용가치Value of use

제2실시예는 다음과 같다.The second embodiment is as follows.

RDF 트리플 변환부(300)가 "센솔로지는 RDF 변환 기술을 개발하였다. 이 회사는 관련특허도 확보하였다."는 문장 각각을 표 3과 같이 RDF 트리플로 변환한다. The RDF triple conversion unit 300 converts each of the sentences into "RDF triple" as shown in Table 3, by "SenSoloro developed RDF conversion technology.

서브젝트Subject 프레디키트Freddie Kit 오브젝트Object 문장3Sentence 3 센솔로지Sen Soloji 개발Development RDF 변환 기술RDF conversion technology 문장4Sentence 4 이 회사This company 확보secure 관련특허Related patents

대용어 추출부(410)는 문장4의 트리플에서 "지시어+명사" 형태의 "이 회사"를 대용어로 추출한다. The term extraction unit 410 extracts " this company "in the form of" directive + noun "

선행어 특정부(430)는 대용어가 "지시어+명사" 형태인 경우, 선행 트리플들 중에서 대용어에 포함된 "명사" 어휘를 형태적으로 포함하는 대상을 찾는다. 만약, 대용어에 포함된 "명사" 어휘를 형태적으로 포함하는 트리플이 없는 경우, 선행어 특정부(430)는 선행 트리플들 중에서 대용어에 포함된 "명사" 어휘와 동일한 자질을 가지는 대상을 찾는다. 이때, 선행어 특정부(430)는 대용어 트리플에서 가까운 선행 트리플부터 거슬러 올라가면서 유사한 자질의 대상을 찾는다. 즉, 선행어 특정부(430)는 선행 트리플 중에서 "이 회사"에 포함된 "회사"라는 어휘를 형태적으로 포함하는 트리플이 없으므로, "회사"와 자질이 동일한 문장3의 서브젝트 "센솔로지"를 선행어로 특정한다. 여기서, "회사"와 "센솔로지"는 "조직"이라는 자질을 공유한다.The leading-word specifying unit 430 finds an object that morphologically includes a "noun" vocabulary included in a vocabulary among preceding triples when the vocabulary is in the form of "directive + noun ". If there is no triple that morphologically includes the "noun" vocabulary included in the vocabulary, the leading-word specifying unit 430 finds an object having the same qualities as the "noun" vocabulary included in the vocabulary among preceding triples . At this time, the leading-line specifying unit 430 finds a similar-quality subject by tracing from a preceding triple nearest to a large-term triple. That is, since there is no triple in the preceding triple that morphologically includes the word "company" included in "the company " in the preceding triple, the subject" Is specified as a leading character. Here, "company" and "sen solo" share the qualities of "organization".

RDF 트리플 복원부(450)는 문장4의 트리플을 표 4와 같이 복원한다.The RDF triple restoring unit 450 restores the triple of the sentence 4 as shown in Table 4.

서브젝트Subject 프레디키트Freddie Kit 오브젝트Object 문장4Sentence 4 이 회사This company 확보secure 관련특허Related patents 복원된 문장4Restored Sentence 4 센솔로지Sen Soloji 확보secure 관련특허Related patents

제3실시예는 다음과 같다.The third embodiment is as follows.

RDF 트리플 변환부(300)가 "월요일에는 힐링캠프를 하고, 화요일에는 화신을 하고, 수요일에는 짝을 합니다. 월요일에 하는 걸 녹화해."라는 문장 각각을 표 5와 같이 RDF 트리플로 변환한다. The RDF triple conversion unit 300 converts each of the sentences "Do healing camp on Monday, do incarnation on Tuesday, and paired on Wednesday, and record on Monday."

서브젝트Subject 프레디키트Freddie Kit 오브젝트Object 문장5-1Sentence 5-1 힐링캠프healing camp 하다Do 월요일Monday 문장5-2Sentence 5-2 화신avatar 하다Do 화요일Tuesday 문장5-3Sentence 5-3 짝match 하다Do 수요일Wednesday 문장6-1Sentence 6-1 거roughness 하다Do 월요일Monday 문장6-2Sentence 6-2 ?x? x 녹화recording 거roughness

대용어 추출부(410)는 문장6-1의 트리플에서 "대명사" 형태의 "거"를 대용어로 추출한다. The large term extraction unit 410 extracts the "pronoun" type "ger" from the triple sentence 6-1 in a large term.

선행어 특정부(430)는 선행 트리플 중에서, 대용어를 포함하는 문장6-1의 트리플과 동일한 선행 트리플을 찾는다. 선행어 특정부(430)는 서브젝트 "거"의 선행어를 찾아야 하므로, 프레디키트와 오브젝트가 동일한 선행 트리플을 찾는다. 그리고, 선행어 특정부(430)는 프레디키트와 오브젝트가 동일한 선행 트리플의 서브젝트를 선행어로 특정한다. 즉, 선행어 특정부(430)는 대용어 "거"를 포함하는 문장 6-1의 프레디키트 "하다", 오브젝트 "월요일"을 포함하는 문장5-1의 트리플을 찾고, 문장5-1의 서브젝트 "힐링캠프"를 선행어로 특정한다. The leading-line specifying unit 430 finds the leading triple in the leading triple that is the same as the triple in the sentence 6-1 including the large term. The leading-line specifying unit 430 finds the leading triple of the subject "gig, " so that the predicate kit and the object find the same leading triple. The leading-line specifying unit 430 specifies the subject of the leading triple in which the object and the predicate kit are the same as the leading line. In other words, the leading character specification unit 430 finds the triple of the sentence 5-1 including the predicate kit "do", the object "Monday" of the sentence 6-1 including the large term " Identify "healing camp" as a leading word.

RDF 트리플 복원부(450)는 문장4의 트리플을 표 6과 같이 복원한다.The RDF triple restoring unit 450 restores the triple of the sentence 4 as shown in Table 6.

서브젝트Subject 프레디키트Freddie Kit 오브젝트Object 문장6-1Sentence 6-1 거roughness 하다Do 월요일Monday 복원된 문장6-1Restored sentence 6-1 힐링캠프healing camp 하다Do 월요일Monday

제4실시예는 다음과 같다.The fourth embodiment is as follows.

RDF 트리플 변환부(300)가 "어제는 재즈를 들었다. 음악을 들으며 그녀를 생각했다."는 문장 각각을 표 7과 같이 RDF 트리플로 변환한다. The RDF triple conversion unit 300 converts each sentence of "Listen to Jazz yesterday, listen to music " to RDF triple as shown in Table 7.

서브젝트Subject 프레디키트Freddie Kit 오브젝트Object 문장7-1Sentence 7-1 ?x? x 때time 어제yesterday 문장7-2Sentence 7-2 ?x? x 듣다Listen 재즈jazz 문장 8-1Sentence 8-1 ?x? x 듣다Listen 음악music 문장8-2Sentence 8-2 ?x? x 생각하다Think 그녀she

대용어 추출부(410)는 문장8-1의 트리플에서 보통명사 "음악"을 대용어로 추출한다. The large term extraction unit 410 extracts the common noun "music" in a triple from the sentence 8-1.

선행어 특정부(430)는 선행 트리플 중에서, 대용어를 포함하는 문장8-1의 트리플과 동일한 선행 트리플을 찾는다. 선행어 특정부(430)는 오브젝트 "음악"의 선행어를 찾아야 하므로, 서브젝트와 프레디키트가 동일한 선행 트리플을 찾는다. 그리고, 선행어 특정부(430)는 서브젝트와 프레디키트가 동일한 선행 트리플의 오브젝트를 선행어로 특정한다. 즉, 선행어 특정부(430)는 대용어 "음악"을 포함하는 문장8-1의 프레디키트 "듣다"를 포함하는 문장7-2의 트리플을 찾고, 문장7-2의 오브젝트 "재즈"를 선행어로 특정한다. The leading-word specifying unit 430 finds the leading triple in the preceding triple that is the same as the triple in the sentence 8-1 including the large term. Since the leading character specification unit 430 must find the leading character of the object "music ", the subject and the predicate kit find the same leading triple. Then, the leading-line specifying unit 430 identifies the subject and the predicate kit as an anterior-lingual object of the same leading triple. That is, the leading character specification unit 430 finds the triple of the sentence 7-2 including the predicate kit " listen "of the sentence 8-1 including the major term" music " .

RDF 트리플 복원부(450)는 문장8-1의 트리플을 표 8과 같이 복원한다.The RDF triple restoring unit 450 restores the triple of the sentence 8-1 as shown in Table 8. [

서브젝트Subject 프레디키트Freddie Kit 오브젝트Object 문장 8-1Sentence 8-1 ?x? x 듣다Listen 음악music 복원된 문장8-1Restored sentence 8-1 ?x? x 듣다Listen 재즈jazz

제5실시예는 다음과 같다.The fifth embodiment is as follows.

RDF 트리플 변환부(300)가 "철수는 아메리카노를 좋아한다. 영희도 좋아한다"는 문장 각각을 표 9와 같이 RDF 트리플로 변환한다. RDF 트리플 변환부(300)는 어느 트리플에 해당하는 어휘가 없으면, 그 자리에 비어 있음을 표시하는 영형대용어 표시 정보를 입력한다.The RDF triple conversion unit 300 converts each of the phrases "Bob likes Americano and likes Lee" into an RDF triple as shown in Table 9. [ If there is no vocabulary corresponding to any triple, the RDF triple conversion unit 300 inputs the Young-type word term display information indicating that it is vacant.

서브젝트Subject 프레디키트Freddie Kit 오브젝트Object 문장9Sentence 9 철수withdrawal 좋아하다like 아메리카노Americano 문장10Sentence 10 영희Young Hee 좋아하다like ?y? y

대용어 추출부(410)는 문장10의 오브젝트에 기재된 영형대용어 표시 정보를 기초로 문장10의 오브젝트 "?y"를 대용어로 추출한다. The term extraction unit 410 extracts the object "? Y" of the sentence 10 as a large term based on the Young-Dae term display information described in the object of the sentence 10.

선행어 특정부(430)는 선행 트리플 중에서, 대용어를 포함하는 문장10의 트리플과 동일한 선행 트리플을 찾는다. 선행어 특정부(430)는 오브젝트의 선행어를 찾아야 하므로, 서브젝트와 프레디키트가 동일한 선행 트리플을 찾는다. 그리고, 선행어 특정부(430)는 찾은 선행 트리플의 오브젝트를 선행어로 특정한다. 즉, 선행어 특정부(430)는 문장10의 프레디키트 "좋아하다"를 포함하는 문장9의 트리플을 찾고, 문장9의 오브젝트 "아메리카노"를 선행어로 특정한다. The leading-line specifying unit 430 finds the leading triple in the preceding triple that is the same as the triple of the sentence 10 including the large term. Since the leading character specification unit 430 must find the leading character of the object, the subject and the predicate kit look for the same leading triple. Then, the leading-line identifying unit 430 identifies an object of the found leading triple as a leading line. That is, the leading-word specifying unit 430 finds the triple of the sentence 9 including the Freddie kit "like" in the sentence 10 and specifies the object "Americano"

RDF 트리플 복원부(450)는 문장10의 트리플을 표 10과 같이 복원한다.The RDF triple restoring unit 450 restores the triple of the sentence 10 as shown in Table 10.

서브젝트Subject 프레디키트Freddie Kit 오브젝트Object 문장10Sentence 10 영희Young Hee 좋아하다like ?y? y 복원된 문장10Restored Sentence 10 영희Young Hee 좋아하다like 아메리카노Americano

제6실시예에 대해 설명한다. The sixth embodiment will be described.

RDF 트리플 변환부(300)가 "철수는 아메리카노를 좋아한다. 까페라떼도 좋아한다"는 문장 각각을 표 11과 같이 RDF 트리플로 변환한다. The RDF triple conversion unit 300 converts each of the phrases "Chil Soo likes Americano " and " likes Cafe Latte" to RDF triples as shown in Table 11.

서브젝트Subject 프레디키트Freddie Kit 오브젝트Object 문장11Sentence 11 철수withdrawal 좋아하다like 아메리카노Americano 문장12Sentence 12 ?x? x 좋아하다like 까페라떼Cafe latte

대용어 추출부(410)는 문장12의 서브젝트에 기재된 영형대용어 표시 정보를 기초로 문장12의 서브젝트 "?x"를 대용어로 추출한다.The term extraction unit 410 extracts the subject "? X" in the sentence 12 on the basis of the Young-Dae term display information described in the subject of the sentence 12 as a large term.

선행어 특정부(430)는 서브젝트의 선행어를 찾아야 하므로, 문장12의 트리플과 오브젝트와 프레디키트가 동일한 선행 트리플을 찾는다. 만약, 동일한 선행 트리플이 없는 경우, 선행어 특정부(430)는 동일한 자질을 포함하는 선행 트리플을 찾는다. 그리고, 선행어 특정부(430)는 찾은 선행 트리플의 서브젝트를 선행어로 특정한다. 즉, 선행어 특정부(430)는 문장12의 프레디키트 "좋아하다"를 포함하고, 문장12의 오브젝트 "까페라떼"와 자질이 동일한 "아메리카노"를 포함하는 문장11의 트리플을 찾고, 문장11의 서브젝트 "철수"를 선행어로 특정한다. Since the leading-language specifying unit 430 must find the subject's leading word, the triple of the sentence 12 and the object and the predicate kit find the same leading triple. If there is no identical preceding triple, the leading-word specifying unit 430 finds a preceding triple including the same qualities. The leading-line specifying unit 430 specifies a subject of the found leading triple as a leading line. That is, the leading-word specifying unit 430 finds a triple of the sentence 11 including the Freddie kit "like" in the sentence 12 and includes "Americano" Specify the subject "withdraw" as a leading character.

RDF 트리플 복원부(450)는 문장12의 트리플을 표 12와 같이 복원한다.The RDF triple restoring unit 450 restores the triple of the sentence 12 as shown in Table 12.

서브젝트Subject 프레디키트Freddie Kit 오브젝트Object 문장12Sentence 12 ?x? x 좋아하다like 까페라떼Cafe latte 복원된 문장12Restored sentence 12 철수withdrawal 좋아하다like 까페라떼Cafe latte

제7실시예에 대해 설명한다. The seventh embodiment will be described.

RDF 트리플 변환부(300)가 " 철수는 언제 서울에 갑니까? 내일 갑니다."는 문장 각각을 표 13과 같이 RDF 트리플로 변환한다. The RDF triple conversion unit 300 converts each sentence of "When is the going to Seoul? Tomorrow?" Into an RDF triple as shown in Table 13.

서브젝트Subject 프레디키트Freddie Kit 오브젝트Object 문장13-1Sentence 13-1 철수withdrawal 가다go 언제(Time)When (Time) 문장13-2Sentence 13-2 철수withdrawal 가다go 서울(Place)Seoul (Place) 문장14Sentence 14 ?x? x 가다go 내일(Time)Tomorrow (Time)

대용어 추출부(410)는 문장14의 서브젝트에 기재된 영형대용어 표시 정보를 기초로 문장14의 서브젝트 "?x"를 대용어로 추출한다.The term extraction unit 410 extracts the subject "? X" in the sentence 14 as a large term based on the Young-Dae term display information described in the subject of the sentence 14.

선행어 특정부(430)는 서브젝트의 선행어를 찾아야 하므로, 문장14의 트리플과 오브젝트와 프레디키트가 동일하거나 자질이 동일한 선행 트리플을 찾는다. 그리고, 선행어 특정부(430)는 찾은 선행 트리플의 서브젝트를 선행어로 특정한다. 즉, 선행어 특정부(430)는 영형대용어를 포함하는 문장14의 프레디키트 "가다"를 포함하고, 문장14의 오브젝트 "내일"과 "time"이라는 자질이 동일한 "언제"를 포함하는 문장13-1의 트리플을 찾고, 문장13-1의 서브젝트 "철수"를 선행어로 특정한다. Since the leading character specification unit 430 must find the subject's leading character, the triple in the sentence 14 and the object and the predicate kit look for a preceding triple having the same or similar qualities. The leading-line specifying unit 430 specifies a subject of the found leading triple as a leading line. That is, the leading-word specifying unit 430 includes a predicate kit "GADA " of the sentence 14 including the Young's word term, and the sentence 13 including the " when" -1 and specifies the subject "withdraw" in sentence 13-1 as a leading character.

RDF 트리플 복원부(450)는 문장14의 트리플을 표 14와 같이 복원한다.The RDF triple restoring unit 450 restores the triple of the sentence 14 as shown in Table 14. [

서브젝트Subject 프레디키트Freddie Kit 오브젝트Object 문장14Sentence 14 ?x? x 가다go 내일(Time)Tomorrow (Time) 복원된 문장14Restored sentence 14 철수withdrawal 가다go 내일(Time)Tomorrow (Time)

도 3은 본 발명의 한 실시예에 따른 대용어 복원 방법의 흐름도이다.3 is a flowchart of a method for restoring a large term according to an embodiment of the present invention.

도 3을 참고하면, 대용어 복원 장치(100)는 텍스트로 구성된 입력문을 입력받는다(S110).Referring to FIG. 3, the apparatus 100 restores the input text composed of text (S110).

대용어 복원 장치(100)는 입력문의 문장성분을 분석한다(S120).The vocabulary restoration apparatus 100 analyzes the sentence components of the input query (S120).

대용어 복원 장치(100)는 입력문의 문장성분을 기초로 각 문장을 RDF 트리플로 변환한다(S130).The vocabulary restoration apparatus 100 converts each sentence into an RDF triple based on the sentence component of the input query (S130).

대용어 복원 장치(100)는 복수의 트리플 중에서 대용어를 포함하는 대용어 트리플을 찾는다(S140).The term restoration apparatus 100 searches for a term term triple including a plurality of terms among a plurality of triples (S140).

대용어 복원 장치(100)는 대용어 트리플의 선행 트리플들과 대용어 트리플을 비교하여 대용어에 해당하는 선행어를 특정한다(S150).The term-based restoration apparatus 100 compares the preceding triples of the major term triple with the major term triple to specify a leading term corresponding to the major term (S150).

대용어 복원 장치(100)는 대용어를 선행어로 치환하여 대용어 트리플을 복원한다(S160).The term restoration apparatus 100 restores a major term triple by replacing the major term with a leading term (S160).

대용어 복원 장치(100)는 복원된 대용어 트리플을 저장한다(S170).The term term recovery apparatus 100 stores the restored term term triple (S170).

도 4는 본 발명의 다른 실시예에 따른 대용어 복원 방법의 흐름도이다.4 is a flowchart of a method for restoring a large term according to another embodiment of the present invention.

도 4를 참고하면, 대용어 처리부(400)는 입력문을 변환한 복수의 RDF 트리플을 입력받는다(S210).Referring to FIG. 4, the vocabulary processing unit 400 receives a plurality of RDF triples converted from the input statements (S210).

대용어 처리부(400)는 각 트리플이 대용어를 포함하는지 판단한다(S220). 대용어 처리부(400)는 서브젝트와 오브젝트 자리에 대용어가 있는지 판단한다. 대용어 처리부(400)는 서브젝트와 오브젝트 자리에 위치한 지시어, 지시어+명사, 보통명사, 영형대용어 표시 정보를 대용어로 추출한다.The term terminology processing unit 400 determines whether each triple includes a term (S220). The term terminology processing unit 400 determines whether there is a term in the place of the subject and the object. The vocabulary processing unit 400 extracts a directive, a directive + noun, a common noun, and a young group term display information, which are located at a subject and an object position, in a large term.

대용어 처리부(400)는 대용어를 포함하지 않는 트리플을 RDF 트리플 리파지토리(500)에 저장한다(S230). The term terminology processing unit 400 stores a triple that does not include a major term in the RDF triple repository 500 (S230).

대용어를 포함하는 대용어 트리플을 발견한 경우, 대용어 처리부(400)는 선행어 특정 조건을 기초로 대용어 트리플과 동일한 어휘 또는 자질을 가진 관련 선행 트리플을 찾는다(S240). 선행어 특정 조건은 대용어의 형태, 대용어에 포함된 어휘, 트리플에 포함된 어휘의 자질, 트리플 간의 유사성 등을 포함한다. 예를 들어, 대용어 처리부(400)는 대용어가 "지시어+명사" 형태인 경우, 선행 트리플들에서 대용어에 포함된 "명사" 어휘를 형태적으로 포함하는 대상을 찾는다. 만약, 대용어에 포함된 "명사" 어휘를 형태적으로 포함하는 트리플이 없는 경우, 대용어 처리부(400)는 선행 트리플들 중에서 대용어에 포함된 "명사" 어휘와 동일한 자질을 가지는 대상을 찾는다. 서브젝트가 대용어를 포함하는 경우, 대용어 처리부(400)는 대용어 트리플의 프레디키트와 오브젝트에 포함된 어휘/어휘의 자질과 동일한 선행 트리플을 찾는다. 오브젝트가 대용어를 포함하는 경우, 대용어 처리부(400)는 대용어 트리플의 프레디키트와 서브젝트에 포함된 어휘/어휘의 자질과 동일한 선행 트리플을 찾는다.When a large term term including a large term is found, the large term processing unit 400 finds an associated preceding triple having the same vocabulary or qualities as the large term triple based on the leading term specific condition (S240). Leading word specific conditions include the form of a vocabulary, the vocabulary contained in a vocabulary, the qualities of a vocabulary contained in a triple, and the similarity of triples. For example, the vocabulary processing unit 400 finds an object that morphologically includes a "noun" vocabulary included in the vocabulary in the preceding triples when the vocabulary is in the form of "directive + noun ". If there is no triple that morphologically includes the "noun" vocabulary contained in the vocabulary, the vocabulary processing unit 400 finds an object having the same qualities as the "noun" vocabulary included in the vocabulary of the preceding triples . If the subject includes a vocabulary, the vocabulary processing unit 400 finds a predecessor triple that is the same as the predicate kit of the vocabulary triple and the vocabulary / vocabulary contained in the object. If the object includes a large term, the large term processing unit 400 finds a predecessor triple that is identical to the predicate kit of the large term triple and the lexical / vocabulary qualities contained in the subject.

대용어 처리부(400)는 관련 선행 트리플에서 대용어에 대응하는 선행어를 특정한다(S250). 예를 들어, 대용어 처리부(400)는 대용어가 "지시어+명사" 형태인 경우, 선행 트리플들에서 "명사" 어휘를 형태적으로 포함하거나, "명사" 어휘와 자질이 동일한 대상을 선행어로 특정한다. 어휘/자질이 동일한 대상이 없는 경우, 대용어 처리부(400)는 찾은 선행 트리플에서 대용어와 동일한 자리에 있는 대상을 선행어로 특정한다. 즉, 대용어가 서브젝트 자리에 있으면, 선행 트리플의 서브젝트에 있는 대상을 선행어로 특정하고, 대용어가 오브젝트 자리에 있으면, 선행 트리플의 오브젝트에 있는 대상을 선행어로 특정할 수 있다.The term terminology processing unit 400 specifies the leading term corresponding to the major term in the related preceding triple (S250). For example, if the large term is in the form of a "directive + noun ", it may be used to formally include the" noun "vocabulary in the preceding triples, Specify. If there is no subject having the same vocabulary / qualities, the term terminology processing unit 400 identifies the subject in the same position as the preceding term in the preceding triple found as a leading character. That is, if a large term is in the subject position, the subject in the subject of the preceding triple can be identified by the leading character, and if the large term is in the place of the object, the subject in the object of the preceding triple can be identified by the leading character.

대용어 처리부(400)는 대용어를 선행어로 치환하여 대용어 트리플을 복원한다(S260).The vocabulary processing unit 400 replaces a vocabulary with a leading word to restore a vocabulary triple (S260).

대용어 처리부(400)는 복원된 대용어 트리플을 RDF 트리플 리파지토리에 저장한다(S270).The term terminology processing unit 400 stores the restored term term triple in the RDF triple repository (S270).

이와 같이, 본 발명의 실시예에 따르면 문장의 의미와 문맥 정보를 RDF 트리플로 표현하므로, 개별적인 문법자질과 의미자질의 부합 정도를 기초로 선행어를 판단하는 방법에 비해 선행어를 정확히 특정할 수 있다. 본 발명의 실시예에 따르면 성, 수, 인칭과 같은 문법자질 그리고 사람, 사물, 장소, 음식, 조직 등의 의미자질 등이 비슷한 선행어 후보가 나열되어 있더라도 대용어에 해당하는 선행어를 정확히 특정할 수 있다. 또한, 본 발명의 실시예에 따르면 영형대용어에 해당하는 선행어를 정확히 특정할 수 있다. 따라서, 본 발명의 실시예에 따르면 자연언어 자동처리 장치들에 더 좋은 성능을 기대할 수 있다. As described above, according to the embodiment of the present invention, the meaning of the sentence and the context information are represented by the RDF triple, so that the leading word can be accurately specified as compared with the method of determining the leading word based on the degree of correspondence between the individual grammar qualities and the semantic qualities. According to the embodiment of the present invention, it is possible to precisely specify a leading term corresponding to a large term even if the candidate of the leading term having similar grammatical qualities such as sex, number, and personality and the semantic qualities of people, objects, places, have. Also, according to the embodiment of the present invention, it is possible to precisely specify the leading word corresponding to the Young-Dang term. Therefore, according to the embodiment of the present invention, a better performance can be expected for automatic language processing apparatuses.

이상에서 설명한 본 발명의 실시예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있다.The embodiments of the present invention described above are not implemented only by the apparatus and method, but may be implemented through a program for realizing the function corresponding to the configuration of the embodiment of the present invention or a recording medium on which the program is recorded.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, It belongs to the scope of right.

Claims

Resource Description Framework (RDF) A triple-based large term restoration device,
An RDF triple conversion unit for converting each sentence included in the input statement into an RDF triple, and
Wherein the RDF triple conversion unit searches for a large term term triple including a large term among a plurality of triples converted by the RDF triple conversion unit and compares the large term triple with a preceding triple of the large term triple to specify a leading term of the large term,
Lt; / RTI >
The term terminology processing section
Finding at least one related preceding triple having the same vocabulary or the same qualities as the corresponding term triple among the preceding triples,
Determining at least one of a subject, a predicate kit, and an object of the large term triple and at least one of the subject, the predicate kit, and the object of the at least one related preceding triple, Wherein the at least one preceding leading triple identifies the leading term of the at least one terminology.

The method of claim 1,
The term terminology processing section
And a triple term having a term in a place of the object and an object among the plurality of triples is extracted as a large term triple.

The method of claim 1,
The term terminology processing section
A terminology restoration device that judges, based on a vocabulary, a vocabulary containing a directive, a normal noun, a pronoun, and a Young's term display information indicating that the subject or object of the RDF triple is empty.

delete

The method of claim 1,
The term terminology processing section
And wherein the vocabulary described in the found subject or object is specified as a preceding word when the vocabulary is a form including an instruction and a noun, the subject or object including the noun in the preceding triples.

The method of claim 5,
The term terminology processing section
Wherein the subject or object including the noun is not found among the preceding triples, the subject or object containing the same vocabulary with the noun is searched.

The method of claim 1,
The term terminology processing section
And wherein if the vocabulary is located at a subject position, the object of the preceding triple and the object of the vocabulary triple are compared with the predicated kit to find the related preceding triple and the subject of the related preceding triple is specified as the preceding word.

8. The method of claim 7,
The term terminology processing section
And wherein if the large term is located in an object position, comparing the subject of the preceding triple with the subject term triple and the predicate kit to find the related preceding triple and specify the object of the related preceding triple as a leading character.

The method of claim 1,
The term terminology processing section
Term restorer by replacing the large term with a leading character, restoring the large term triple, and storing the restored large term triple.

The method of claim 1,
The term terminology processing section
A large term extraction unit for extracting a large term triple including a large term from the plurality of triples converted by the RDF triple conversion unit,
A preceding-word specifying unit for specifying a preceding term of the vocabulary by comparing the preceding-term triple with the preceding-term triple of the vocabulary triple, and
An RDF triple restoring unit for restoring the large term triple by replacing the large term with a leading term,
The term restoration device comprising:

A method for restoring a large term based on a Resource Description Framework (RDF) triple,
Receiving a plurality of RDF triples corresponding to each sentence of an input query;
Searching for a large term triple including a large term among a plurality of input triples,
Finding at least one related preceding triple having the same vocabulary or qualities as the corresponding term triple among the preceding triples of the corresponding term triple, and
Determining at least one of a subject, a predicate kit and an object of the corresponding term triple and at least one of the subject, the predicate kit and the object of the at least one related preceding triple, Identifying the leading term of the vocabulary in at least one related preceding triple
/ RTI >

12. The method of claim 11,
The step of finding the term triple
When at least one of a vocabulary including a directive, a common noun, a pronoun, and a Young-tongue term display information is described in at least one of a subject and an object of an arbitrary triple,
Wherein the Young's Unit term display information is information indicating that a subject or object of the corresponding triple is empty.

12. The method of claim 11,
The step of finding the related preceding triple
If the large term is a form including a directive and a noun, searching for a subject or object containing the noun from preceding triples,
The step of extracting the lead-
A vocabulary restoration method for extracting a vocabulary described in a found subject or object with a leading term.

The method of claim 13,
The step of finding the related preceding triple
If the subject or object including the noun is not found among the preceding triples, searching for the subject or object having the same qualities as the noun.

12. The method of claim 11,
The step of finding the related preceding triple
Comparing the preceding triples and the object of the corresponding term triple with the predicated kit to find a related preceding triple having the same vocabulary or qualities as the corresponding term triple when the corresponding term is located at the subject position.

16. The method of claim 15,
The step of identifying the preceding term
And identifying the subject of the related preceding triple as a leading word.

12. The method of claim 11,
The step of finding the related preceding triple
Comparing the preceding triples with the subject of the corresponding term triple and the predicated kit to find an associated preceding triple having the same vocabulary or qualities as the corresponding term triple.

The method of claim 17,
The step of identifying the preceding term
And the object of the related preceding triple is specified as a leading word.

12. The method of claim 11,
Replacing the large term with a leading term to restore the large term triple; and
The step of storing the restored term term triple
Further comprising the steps of: