KR102338949B1

KR102338949B1 - System for Supporting Translation of Technical Sentences

Info

Publication number: KR102338949B1
Application number: KR1020200020320A
Authority: KR
Inventors: 이영호
Original assignee: 이영호
Priority date: 2020-02-19
Filing date: 2020-02-19
Publication date: 2021-12-10
Also published as: KR20210105626A

Abstract

본 발명은 기술문서 관련 번역 작업을 지원하는 기술문서 번역 지원 시스템에 관한 것으로, 번역자가 기술분야별 전문 어휘나 용어를 빠르게 파악할 수 있게 하고 휴먼에러를 최소화시킴으로써 기술문서 번역의 효율 및 품질을 향상시킨다.
이러한 본 발명은, 최소단위 문장으로 형성된 제1 언어의 제1 참조데이터(10a)와 상기 제1 참조데이터(10a)에 대응하는 제2 언어의 제2 참조데이터(10b)로 이루어진 참조데이터 세트(10)가 기술분류별로 구분되어 저장된 번역 지원용 DB(100); 상기 번역 지원용 DB(100)와 연동되어 참조데이터 세트(10)를 로드하는 사용자 단말기(200); 및 상기 사용자 단말기(200)에 설치되며, 제1 언어의 원문을 최소단위 문장으로 구분한 번역 대상데이터(20a)를 출력하고 상기 번역 대상데이터(20a)에 대응하는 제2 언어의 번역 수행데이터(20b)를 입력받는 번역작업용 모듈(311)과, 검색어를 입력받아 검색어정보를 생성하는 검색용 모듈(312)과, 상기 번역 대상데이터(20a)의 기술분류와 매칭되고 상기 검색어정보가 하나 이상 포함된 참조데이터 세트(10)를 상기 번역 지원용 DB(100)에서 서치하여 출력하는 참조용 모듈(313)을 포함하는 번역 지원용 어플리케이션(300);을 포함하는 것을 기술적 특징으로 한다.The present invention relates to a technical document translation support system that supports technical document-related translation work, and improves the efficiency and quality of technical document translation by enabling a translator to quickly grasp specialized vocabulary or terms for each technical field and minimizing human errors.
The present invention provides a reference data set ( 10) DB for translation support stored by dividing by technical classification (100); a user terminal 200 for loading a reference data set 10 in conjunction with the translation support DB 100; And it is installed in the user terminal 200, and outputs the translation target data 20a in which the original text of the first language is divided into the minimum unit sentence, and the translation performance data of the second language corresponding to the translation target data 20a ( A module for translation work 311 that receives 20b) input, a module 312 for search that receives a search word input and generates search word information, matches the technical classification of the translation target data 20a and includes at least one search word information and a translation support application 300 including a reference module 313 for searching and outputting the reference data set 10 from the translation support DB 100;

Description

Technical Document Translation Support System {System for Supporting Translation of Technical Sentences}

본 발명은 번역 지원 시스템에 관한 것으로, 보다 구체적으로 기술문서 관련 번역 작업을 지원하는 기술문서 번역 지원 시스템에 관한 것이다.The present invention relates to a translation support system, and more particularly, to a technical document translation support system that supports a technical document-related translation work.

대표적인 기술문서에는 특허 명세서, 논문서 등이 포함되며, 특히 지식재산권을 확보할 수 있는 특허 명세서의 경우에는 속지주의 원칙에 따라 어느 한 국가에서 특허에 대한 권리를 보호받고 싶은 경우 해당 국가에 출원을 진행해야 하며, 이때 대부분의 나라에서 해당 나라의 언어로 특허 문헌을 번역 또는 작성하여 출원해야 한다.Representative technical documents include patent specifications and dissertations. In particular, in the case of patent specifications that can secure intellectual property rights, if you want to have your patent rights protected in a country according to the principle of territoriality, you can file an application in that country. At this time, in most countries, a patent document must be translated or prepared in the language of the country and applied for.

이러한 기술문서는 일반적으로 해당 기술분야에서 사용되는 전문적 어휘 및 용어로 작성되므로, 해당 분야에 대하여 전공 지식 등이 있는 전문가의 번역이 필요하다. 특히, 특허 명세서를 번역하는 경우에는 단어의 의미에 따라 권리범위가 달라질 수 있으므로 번역 작업시 신중한 어휘 선택이 요구된다. Since these technical documents are generally written in specialized vocabulary and terms used in the relevant technical field, translation by an expert with major knowledge in the relevant field is required. In particular, when translating a patent specification, the scope of rights may vary depending on the meaning of a word, so careful selection of vocabulary is required during translation.

그러나 기술문서가 속하는 전공 분야에서 능숙한 번역 전문가를 찾기가 어려운 점이 있으며, 적합한 번역자가 있더라도 기술문서 특성상 번역에 많은 시간이 소요되고 그 만큼 번역 비용에 대한 부담이 증가하는 문제점이 있었다.However, there is a problem in that it is difficult to find a competent translation expert in the field of study to which the technical document belongs, and even if there is a suitable translator, the translation takes a lot of time due to the nature of the technical document, and the burden on the translation cost increases accordingly.

또한, 기술문서에서는 도면을 통해 기술구성에 대하여 서술하는 경우가 다수인데 이 경우 기술구성 용어와 도면부호가 일치하지 않거나, 동일한 용어에 대한 번역어가 일치하지 않는 등의 휴먼에러(Human Error)가 발생하기 쉽다. 이러한 번역자의 오타, 오기 등이 발생할 경우 번역의 품질이 저하되어 문제가 되어왔다.In addition, in technical documents, there are many cases where the technical configuration is described through drawings, but in this case, human errors such as the technical configuration term and the drawing number do not match, or the translation for the same term does not match, etc. easy to do. When such a typo or typo by a translator occurs, the quality of the translation is deteriorated, which has become a problem.

따라서 이러한 문제점을 해결하기 위하여 번역 작업을 지원해주고 휴먼에러나 용어불일치를 최소화하기 위한 보조 시스템이 필요하며, 본 발명은 이러한 요구에 따라 기술문서 번역 작업을 지원해주도록 하는 서비스 시스템에 관한 것이다.Therefore, in order to solve such a problem, an auxiliary system for supporting translation work and minimizing human error or term inconsistency is required, and the present invention relates to a service system for supporting technical document translation work according to such a request.

대한민국특허청 공개특허공보 제10-2013-0042839호Korean Patent Office Laid-Open Patent Publication No. 10-2013-0042839 대한민국특허청 등록특허공보 제10-1052004호Korean Intellectual Property Office Registered Patent Publication No. 10-1052004

본 발명은 이러한 종래 기술의 문제점을 해결하기 위한 것으로, 원문의 기술분류와 매칭되는 참조데이터 중에서 번역하고자 하는 검색어를 입력받아 해당 검색어가 포함된 번역예시 문장들을 출력시켜 기술문서 번역의 어려움을 최소화시키는 기술문서 번역 지원 시스템의 제공을 과제로 한다.The present invention is to solve the problems of the prior art, which receives a search word to be translated from reference data matching the technical classification of the original text and outputs translation example sentences including the search word to minimize the difficulty of translating technical documents. The task is to provide a technical document translation support system.

또한, 본 발명은 번역자가 작업한 번역 수행데이터 중에서 도면부호가 기입된 어휘 리스트를 도면부호와 함께 출력시켜 휴먼에러를 검출할 수 있는 기술문서 번역 지원 시스템의 제공을 다른 과제로 한다.Another object of the present invention is to provide a technical document translation support system capable of detecting a human error by outputting a vocabulary list in which reference numbers are written among translation performance data worked by a translator together with reference numbers.

본 발명에 의한 기술문서 번역 지원 시스템은, 최소단위 문장으로 형성된 제1 언어의 제1 참조데이터(10a)와 상기 제1 참조데이터(10a)에 대응하는 제2 언어의 제2 참조데이터(10b)로 이루어진 참조데이터 세트(10)가 기술분류별로 구분되어 저장된 번역 지원용 DB(100); 상기 번역 지원용 DB(100)와 연동되어 참조데이터 세트(10)를 로드하는 사용자 단말기(200); 및 상기 사용자 단말기(200)에 설치되며, 제1 언어의 원문을 최소단위 문장으로 구분한 번역 대상데이터(20a)를 출력하고 상기 번역 대상데이터(20a)에 대응하는 제2 언어의 번역 수행데이터(20b)를 입력받는 번역작업용 모듈(311)과, 검색어를 입력받아 검색어정보를 생성하는 검색용 모듈(312)과, 상기 번역 대상데이터(20a)의 기술분류와 매칭되고 상기 검색어정보가 하나 이상 포함된 참조데이터 세트(10)를 상기 번역 지원용 DB(100)에서 서치하여 출력하는 참조용 모듈(313)이 포함되는 번역 지원용 어플리케이션(300);을 포함할 수 있다.The technical document translation support system according to the present invention includes first reference data 10a of a first language formed of minimum unit sentences and second reference data 10b of a second language corresponding to the first reference data 10a. A reference data set 10 consisting of a DB 100 for translation support stored by being divided by technical classification; a user terminal 200 for loading a reference data set 10 in conjunction with the translation support DB 100; And it is installed in the user terminal 200, and outputs the translation target data 20a in which the original text of the first language is divided into the minimum unit sentence, and the translation performance data of the second language corresponding to the translation target data 20a ( A module for translation work 311 that receives 20b) input, a module 312 for search that receives a search word input and generates search word information, matches the technical classification of the translation target data 20a and includes at least one search word information It may include a;

본 발명의 검색용 모듈(312)은, 상기 번역작업용 모듈(311)을 통해 출력된 번역 대상데이터(20a) 중에서 드래그된 문자열(dragged string)을 수집하여 검색어정보를 생성할 수 있다.The search module 312 of the present invention may generate search word information by collecting dragged strings from among the translation target data 20a output through the translation operation module 311 .

본 발명의 참조용 모듈(313)은, 상기 검색용 모듈(312)로부터 검색어정보가 전달되면, 상기 번역 지원용 DB(100)로부터 참조데이터 세트(10)를 수집하는 참조데이터 수집유닛(313a); 상기 수집된 참조데이터 세트(10)의 제1 참조데이터(10a)와 상기 드래그된 문자열이 포함된 번역 대상데이터(20a)를 각각 단어 단위로 분할하는 데이터 분할유닛(313b); 상기 분할된 제1 참조데이터(10a) 및 번역 대상데이터(20a)의 단어가 서로 일치하는 개수에 따라 유사도를 판단하는 유사도 측정유닛(313c); 및 상기 측정된 유사도가 높은 순서대로 수집된 참조데이터 세트(10)를 출력하는 참조데이터 출력유닛(313d);을 포함할 수 있다.The reference module 313 of the present invention includes: a reference data collection unit 313a that collects a reference data set 10 from the translation support DB 100 when search word information is transmitted from the search module 312; a data dividing unit 313b for dividing the first reference data 10a of the collected reference data set 10 and the translation target data 20a including the dragged character string into word units, respectively; a similarity measuring unit (313c) for determining a degree of similarity according to the number of words in the divided first reference data (10a) and the translation target data (20a) coincide with each other; and a reference data output unit 313d for outputting the reference data sets 10 collected in the order in which the measured similarity is high.

또한, 본 발명에 의한 기술문서 번역 지원 시스템은, 최소단위 문장으로 형성된 제1 언어의 제1 참조데이터(10a)와 상기 제1 참조데이터(10a)에 대응하는 제2 언어의 제2 참조데이터(10b)로 이루어진 참조데이터 세트(10)가 기술분류별로 구분되어 저장된 번역 지원용 DB(100); 상기 번역 지원용 DB(100)와 연동되어 참조데이터 세트(10)를 로드하는 사용자 단말기(200); 및 상기 사용자 단말기(200)에 설치되는 번역 지원용 어플리케이션(300);을 포함하며, 상기 번역 지원용 어플리케이션(300)은, 제1 언어의 원문을 최소단위 문장으로 구분한 번역 대상데이터(20a)에 대응하는 제2 언어의 번역 수행데이터(20b)를 입력받고, 상기 번역 대상데이터(20a)의 기술분류와 매칭되는 참조데이터 세트(10) 중 입력된 검색어를 하나 이상 포함하는 참조데이터 세트(10)를 출력하는 번역지원 컴포넌트(310)와, 상기 번역 대상데이터(20a)와 번역 수행데이터(20b) 중 어느 하나 이상에서 도면부호가 기입된 어휘들을 추출하고, 상기 추출된 어휘 리스트를 도면부호와 함께 출력하는 에러검출 컴포넌트(320)를 포함할 수 있다.In addition, in the technical document translation support system according to the present invention, first reference data 10a of a first language formed as a minimum unit sentence and second reference data of a second language corresponding to the first reference data 10a ( a DB 100 for translation support in which the reference data set 10 consisting of 10b) is divided by technical classification and stored; a user terminal 200 for loading a reference data set 10 in conjunction with the translation support DB 100; and a translation support application 300 installed in the user terminal 200, wherein the translation support application 300 corresponds to the translation target data 20a in which the original text of the first language is divided into minimum unit sentences. receiving the translation performance data 20b of a second language, and a reference data set 10 including one or more The translation support component 310 for outputting, extracts the vocabulary with reference marks from any one or more of the translation target data 20a and the translation performance data 20b, and outputs the extracted vocabulary list together with the reference numbers and an error detection component 320 that

본 발명의 에러검출 컴포넌트(320)는, 심도(Depth) 설정에 따라 상기 추출된 어휘를 중심으로 전/후 인접 단어를 더 포함하는 유사단어확인용 문자열을 추출하고, 상기 번역 대상데이터(20a)와 번역 수행데이터(20b) 중에서 상기 유사단어확인용 문자열과 매칭율이 기준매칭율보다 높은 유사단어확인용 문자열 리스트를 출력하는 에러검출 확장용 모듈(321)을 포함할 수 있다.The error detection component 320 of the present invention extracts a character string for confirming similar words further including front and rear adjacent words around the extracted vocabulary according to the setting of the depth, and the translation target data 20a and an error detection extension module 321 for outputting a string list for similar word verification in which the matching rate with the string for checking similar words among the translation performance data 20b is higher than a reference matching rate.

본 발명에 의한 기술문서 번역 지원 시스템에 따르면, 원문 번역 시 검색어 입력을 통해 관련 기술분야의 번역 참조데이터를 손쉽게 검색할 수 있다. 이에 따라 번역자가 기술분야별 전문 어휘나 용어를 빠르게 파악할 수 있게 되므로 번역의 속도와 정확도를 향상시킬 수 있다.According to the technical document translation support system according to the present invention, it is possible to easily search for translation reference data in a related technical field by inputting a search word when translating an original text. Accordingly, since the translator can quickly grasp specialized vocabulary or terms for each technical field, the speed and accuracy of translation can be improved.

또한, 본 발명에 의하면 번역 수행데이터 중에서 도면부호가 기입된 어휘 리스트를 출력시킬 수 있다. 이에 따라 번역자가 다수의 도면부호와 기술구성을 한 눈에 확인할 수 있어 휴먼에러를 최소화시켜 번역의 품질을 향상시킬 수 있다.Also, according to the present invention, it is possible to output a vocabulary list in which reference numerals are written among the translation performance data. Accordingly, the translator can check multiple reference numbers and technical configurations at a glance, thereby minimizing human errors and improving the quality of translation.

도 1은 본 발명의 실시예에 따른 기술문서 번역 지원 시스템의 기본 구성을 나타내는 블록도이다.
도 2는 본 발명의 다른 실시예에 따른 기술문서 번역 지원 시스템의 기본 구성을 나타내는 블록도이다.
도 3은 본 발명의 실시예에 따른 번역 지원용 어플리케이션의 구성을 나타내는 블록도이다.
도 4는 본 발명의 실시예에 따른 기술문서 번역 지원 시스템의 번역지원 컴포넌트의 구성을 나타내는 블록도이다.
도 5는 본 발명의 실시예에 따른 기술문서 번역 지원 시스템의 참조용 모듈 및 용어색인용 모듈의 출력을 나타내는 도면이다.
도 6은 도 4에서 참조용 모듈의 구성을 나타내는 블록도이다.
도 7은 본 발명의 실시예에 따른 기술문서 번역 지원 시스템의 에러검출 컴포넌트의 구성을 나타내는 블록도이다.
도 8은 도 본 발명의 실시예에 따른 기술문서 번역 지원 시스템의 에러검출용 모듈의 출력을 나타내는 도면이다.
도 9는 본 발명의 실시예에 따른 기술문서 번역 지원 시스템의 용어통일 컴포넌트의 구성을 나타내는 블록도이다.
도 10은 본 발명의 실시예에 따른 기술문서 번역 지원 시스템의 작동 방법을 나타내는 순서도이다.1 is a block diagram showing the basic configuration of a technical document translation support system according to an embodiment of the present invention.
2 is a block diagram showing the basic configuration of a technical document translation support system according to another embodiment of the present invention.
3 is a block diagram showing the configuration of an application for translation support according to an embodiment of the present invention.
4 is a block diagram illustrating a configuration of a translation support component of a technical document translation support system according to an embodiment of the present invention.
5 is a diagram illustrating outputs of a reference module and a glossary index module of the technical document translation support system according to an embodiment of the present invention.
6 is a block diagram illustrating the configuration of a reference module in FIG. 4 .
7 is a block diagram illustrating the configuration of an error detection component of a technical document translation support system according to an embodiment of the present invention.
8 is a diagram illustrating an output of an error detection module of the technical document translation support system according to the embodiment of the present invention.
9 is a block diagram illustrating a configuration of a term unification component of a technical document translation support system according to an embodiment of the present invention.
10 is a flowchart illustrating a method of operating a technical document translation support system according to an embodiment of the present invention.

이하, 첨부된 도 1 내지 도 10을 참조하여 본 발명의 실시예를 상세하게 설명한다. 다만, 본 발명을 설명함에 있어서 이미 공지된 기능 혹은 구성에 대한 설명은 본 발명의 요지를 명료하게 하기 위하여 생략하기로 한다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings 1 to 10. However, in describing the present invention, a description of a function or configuration already known will be omitted in order to clarify the gist of the present invention.

한편, 도면과 구체적인 내용에서 일반적인 단말기, DB(데이터베이스), 어플리케이션 등으로부터 이 분야의 종사자들이 용이하게 알 수 있는 구성 및 작용에 대한 도시 및 언급은 간략히 하거나 생략하였다. On the other hand, in the drawings and specific contents, the drawings and descriptions of configurations and actions that can be easily recognized by those skilled in this field from general terminals, DBs (databases), applications, etc. are simplified or omitted.

도 1 및 2를 참조하면, 본 발명의 실시예에 따른 기술문서 번역 지원 시스템은 번역 지원용 DB(100), 사용자 단말기(200) 및 번역 지원용 어플리케이션(300)을 포함한다.1 and 2 , the technical document translation support system according to an embodiment of the present invention includes a translation support DB 100 , a user terminal 200 , and a translation support application 300 .

번역 지원용 DB(100)는 번역된 기술문서 관련 데이터가 저장되어 있는 데이터 집합체로서, 저장된 데이터를 검색, 저장 및 관리하기 위한 DBMS(Database Management System)을 포함할 수 있다.The translation support DB 100 is a data set in which translated technical document-related data is stored, and may include a DBMS (Database Management System) for searching, storing, and managing the stored data.

이러한 번역 지원용 DB(100)에는 최소단위 문장으로 형성된 제1 언어의 제1 참조데이터(10a)와 상기 제1 참조데이터(10a)에 대응하는 제2 언어의 제2 참조데이터(10b)로 이루어진 참조데이터 세트(10)가 저장된다. 여기서 제1 언어와 제2 언어는 한국어, 영어, 중국어, 일본어, 스페인어 등의 다양한 언어가 해당될 수 있으며, 제1 언어와 제2 언어는 서로 다르다. 즉 참조데이터 세트(10)에는 제1 언어를 가지는 제1 참조데이터(10a)와 이를 제2 언어로 번역한 제2 참조데이터(10b)가 포함된다.In the DB 100 for translation support, a reference consisting of first reference data 10a of a first language formed as a minimum unit sentence and second reference data 10b of a second language corresponding to the first reference data 10a A data set 10 is stored. Here, the first language and the second language may correspond to various languages such as Korean, English, Chinese, Japanese, and Spanish, and the first language and the second language are different from each other. That is, the reference data set 10 includes first reference data 10a having a first language and second reference data 10b translated into a second language.

이러한 제1 참조데이터(10a)와 제2 참조데이터(10b)는 각각 최소단위 문장으로 형성되는데, 여기서 최소단위 문장은 하나 또는 다수의 문장을 의미할 수 있다. 이때 "문장"은 생각이나 감정을 언어로 표현할 때 완결된 의미를 나타내는 최소단위로서, 주어와 서술어를 갖추고 있는 것이 원칙이나 때로 이런 것이 생략될 수도 있다. Each of the first reference data 10a and the second reference data 10b is formed as a minimum unit sentence, where the minimum unit sentence may mean one or a plurality of sentences. At this time, "sentence" is the smallest unit that shows the complete meaning when expressing thoughts or feelings in language.

만약 최소단위 문장이 하나의 문장을 의미하게 될 경우, 일반적으로 문장 끝에는 ‘.’, ‘’, ‘!’ 따위의 마침표가 찍히므로, 이를 기준으로 제1 참조데이터(10a)와 제2 참조데이터(10b)를 구분하도록 할 수 있다. 이와 다르게, 최소단위 문장이 다수의 문장을 의미하게 될 경우, 하나의 문단이나 목차를 기준으로 제1 참조데이터(10a)와 제2 참조데이터(10b)를 구분하도록 할 수 있다. 예를 들어, 특허 명세서에서는 문헌번호([0001],[0002] 등) 또는 식별항목([기술분야], [발명의 내용], [청구범위] 등)을 기준으로 구분할 수 있다.If the minimum unit sentence means a single sentence, a period such as '.', '', '!' is usually stamped at the end of the sentence, so the first reference data 10a and the second reference data are based on this. (10b) can be distinguished. Alternatively, when the minimum unit sentence means a plurality of sentences, the first reference data 10a and the second reference data 10b may be distinguished based on one paragraph or a table of contents. For example, in the patent specification, literature numbers ([0001], [0002], etc.) or identification items ([technical field], [contents of the invention], [claims], etc.) can be divided based on the criteria.

한편 번역 지원용 DB(100)에 저장되는 참조데이터 세트(10)는 기술분류별로 구분되어 저장되는데, 이러한 구성에 따라 번역 지원용 DB(100) 중에서 번역하고자 하는 원문의 기술분류와 매칭되는 참조데이터 세트(10)만 로드하여 과부화를 최소화하고 데이터 서칭 속도를 향상시킬 수 있다.On the other hand, the reference data set 10 stored in the translation support DB 100 is divided and stored by technical classification. According to this configuration, a reference data set matching the technical classification of the original text to be translated among the translation support DB 100 ( 10) can be loaded to minimize overload and improve data retrieval speed.

여기서 기술분류는 기술분야별 명칭(기계, 전자, 전산, 화학 등) 또는 기술분야별 주요 키워드(발광소자, 열전소자, 빅데이터, 비콘, 실리콘 등), 국제특허분류(IPC), CPC(협력적특허분류) 등을 기준으로 할 수 있다.Here, the technology classification refers to the name of each technology field (mechanical, electronic, computerized, chemical, etc.) or key keyword by technology field (light emitting device, thermoelectric device, big data, beacon, silicon, etc.), International Patent Classification (IPC), CPC (Cooperative Patent) classification), etc.

사용자 단말기(200)는 사용자가 소유하게 되는 데이터 입/출력을 위한 기기로서, PC 또는 스마트폰, 스마트패드 등이 해당될 수 있다. 이러한 사용자 단말기(200)는 번역 지원용 DB(100)와 연동되어 참조데이터 세트(10)를 로드한다.The user terminal 200 is a device for data input/output owned by a user, and may be a PC, a smart phone, a smart pad, or the like. The user terminal 200 is linked with the DB 100 for translation support to load the reference data set 10 .

번역 지원용 어플리케이션(300)은 상기 사용자 단말기(200)에 설치되는 것으로, 도 3에서와 같이 번역지원 컴포넌트(310), 에러검출 컴포넌트(320) 및 용어통일 컴포넌트(330)를 포함할 수 있다.The translation support application 300 is installed in the user terminal 200 and may include a translation support component 310 , an error detection component 320 , and a term unification component 330 as shown in FIG. 3 .

번역지원 컴포넌트(310)는 제1 언어의 원문을 최소단위 문장으로 구분한 번역 대상데이터(20a)에 대응하는 제2 언어의 번역 수행데이터(20b)를 입력받고, 번역 대상데이터(20a)의 기술분류와 매칭되는 참조데이터 세트(10) 중 입력된 검색어를 하나 이상 포함하는 참조데이터 세트(10)를 출력한다.The translation support component 310 receives the translation performance data 20b of the second language corresponding to the translation target data 20a in which the original text of the first language is divided into minimum unit sentences, and the description of the translation target data 20a The reference data set 10 including one or more inputted search words among the reference data sets 10 matching the classification is output.

구체적으로, 번역지원 컴포넌트(310)는 도 4에서와 같이 번역작업용 모듈(311), 검색용 모듈(312), 참조용 모듈(313), 기술분류용 모듈(314) 및 저장용 모듈(315)를 포함할 수 있다.Specifically, as shown in FIG. 4, the translation support component 310 includes a module for translation work 311, a module for search 312, a module for reference 313, a module for technical classification 314, and a module for storage 315. may include

번역작업용 모듈(311)은 제1 언어의 원문을 최소단위 문장으로 구분한 번역 대상데이터(20a)를 출력하고 이에 대응하는 제2 언어의 번역 수행데이터(20b)를 입력받는다. The translation work module 311 outputs the translation target data 20a obtained by dividing the original text of the first language into minimum unit sentences and receives the corresponding translation performance data 20b of the second language.

검색용 모듈(312)은 검색어를 입력받아 검색어정보를 생성한다. 이때 검색어 입력은 번역작업용 모듈(311)을 통해 출력된 번역 대상데이터(20a) 중에서 드래그된 문자열(dragged string)이거나 텍스트로 입력된 문자열일 수 있다. 검색용 모듈(312)은 이러한 드래그된 문자열 또는 텍스트로 입력된 문자열을 수집하여 검색어정보를 생성한다.The search module 312 receives a search word and generates search word information. In this case, the search word input may be a dragged string from among the translation target data 20a output through the translation task module 311 or a string input as text. The search module 312 generates search word information by collecting the dragged string or a string input as text.

참조용 모듈(313)은 번역 대상데이터(20a)의 기술분류와 매칭되고 검색용 모듈(312)을 통해 생성된 검색어정보가 하나 이상 포함된 참조데이터 세트(10)를 번역 지원용 DB(100)에서 서치하여 출력한다. 예를 들어, 도 5와 같이, 사용자에 의해 "기판"이라는 용어가 번역 대상데이터(20a)에서 드래그됨으로써 검색어 입력되면, 참조용 모듈(313)은 생성된 검색어 입력을 하나 이상 포함한 참조데이터 세트(10)를 서치하여 출력한다. The reference module 313 matches the technical classification of the translation target data 20a and retrieves the reference data set 10 including one or more search terms information generated through the search module 312 in the translation support DB 100 . Search and print. For example, as shown in FIG. 5 , when the term “substrate” is input by the user as a search term by being dragged from the translation target data 20a, the reference module 313 provides a reference data set ( 10) is searched and printed.

보다 구체적으로, 참조용 모듈(313)은 도 6에서와 같이 참조데이터 수집유닛(313a), 데이터 분할유닛(313b), 유사도 측정유닛(313c) 및 참조데이터 출력유닛(313d)을 포함할 수 있다.More specifically, the reference module 313 may include a reference data collection unit 313a, a data division unit 313b, a similarity measurement unit 313c and a reference data output unit 313d, as shown in FIG. 6 . .

여기서 참조데이터 수집유닛(313a)은 검색용 모듈(312)로부터 검색어정보가 전달되면 번역 지원용 DB(100)로부터 참조데이터 세트(10)를 수집하고, 데이터 분할유닛(313b)는 이렇게 수집된 참조데이터 세트(10)의 제1 참조데이터(10a) 및 검색어정보가 포함된 번역 대상데이터(20a)를 각각 단어 단위로 분할한다.Here, the reference data collection unit 313a collects the reference data set 10 from the translation support DB 100 when the search word information is transmitted from the search module 312, and the data dividing unit 313b collects the reference data collected in this way. The first reference data 10a of the set 10 and the translation target data 20a including search word information are divided into word units, respectively.

유사도 측정유닛(313c)은 이렇게 분할된 제1 참조데이터(10a) 및 번역 대상데이터(20a)의 단어가 서로 일치하는 개수에 따라 유사도를 판단한다. 예를 들어, 수집된 참조데이터 세트(10)의 제1 참조데이터(10a)에는 총 10개의 단어가 포함되어 있는데, 이와 일치하는 검색어정보가 포함된 번역 대상데이터(20a)의 단어 개수가 8개인 경우 유사도는 80%로 표시될 수 있다.The similarity measuring unit 313c determines the similarity according to the number of words in the first reference data 10a and the translation target data 20a divided as described above. For example, the first reference data 10a of the collected reference data set 10 includes a total of 10 words, and the number of words in the translation target data 20a including search word information matching them is 8. In this case, the similarity may be expressed as 80%.

참조데이터 출력유닛(313d)은 참조데이터 수집유닛(313a)에 의해 수집된 참조데이터 세트(10)를 리스트 형태로 출력하되, 유사도 측정유닛(313c)에 의해 측정된 유사도가 높은 순서대로 정렬하여 출력할 수 있다.The reference data output unit 313d outputs the reference data set 10 collected by the reference data collection unit 313a in the form of a list, but arranges and outputs the similarity measured by the similarity measuring unit 313c in the order of highest can do.

기술분류용 모듈(314)은 기술분류를 입력받거나 업로드된 원문의 내용을 스캐닝 후 빅데이터를 이용하여 기술분류를 자동 판단하는 등의 방법으로 기술분류정보를 수집 및 생성한다. 기술분류용 모듈(314)의 작동 시기는 원문 업로드 전/후가 될 수 있으며, 다만 참조용 모듈(313)에 의한 참조데이터 세트(10) 서칭 이전인 것이 바람직하다.The technical classification module 314 collects and generates technical classification information by receiving a technical classification input or scanning the uploaded original text and automatically determining the technical classification using big data. The operating time of the technical classification module 314 may be before/after uploading the original text, but it is preferable that the reference data set 10 is searched before the reference data set 10 by the reference module 313 .

저장용 모듈(315)은, 사용자에 의해 선택되면, 번역 대상데이터(20a)와 번역 수행데이터(20b)를 하나의 참조데이터 세트(10)로서 번역 지원용 DB(100)에 저장한다. 즉 번역 대상데이터(20a)가 제1 참조데이터(10a)로, 번역 수행데이터(20b)가 제2 참조데이터(10b)로 이루어지는 새로운 참조데이터 세트(10)가 번역 지원용 DB(100)에 저장된다. 이러한 구성에 의해, 사용자가 원문을 번역한 데이터는 번역 지원용 DB(100)에 축적되어 활용될 수 있다.When the storage module 315 is selected by the user, the translation target data 20a and the translation performance data 20b are stored in the translation support DB 100 as one reference data set 10 . That is, a new reference data set 10 including the translation target data 20a as the first reference data 10a and the translation performance data 20b as the second reference data 10b is stored in the translation support DB 100 . . With this configuration, data translated by the user may be accumulated in the DB 100 for translation support and utilized.

이때 저장용 모듈(315)은 기술분류용 모듈(314)을 통해 생성된 기술분류정보를 입력받아, 번역 지원용 DB(100) 중에서 원문의 기술분류와 매칭되는 곳에 번역 대상데이터(20a)와 번역 수행데이터(20b)를 저장하도록 할 수 있다.At this time, the storage module 315 receives the technical classification information generated through the technical classification module 314 and performs translation with the translation target data 20a in a place matching the technical classification of the original text in the translation support DB 100 . The data 20b may be stored.

에러검출 컴포넌트(320)는 기술구성 용어와 도면부호가 일치하지 않거나 도면부호가 기재되지 않은 기술구성 용어를 검출해내기 위한 것으로서, 도 7에서와 같이 에러검출용 모듈(321) 및 에러검출 확장용 모듈(322)를 포함할 수 있다. The error detection component 320 is to detect a technical configuration term that does not match the technical configuration term and the reference numeral or is not described in the technical configuration term. As shown in FIG. 7 , the error detection module 321 and the error detection extension module 322 .

에러검출용 모듈(321)은 에러검출 요청에 따라 번역 대상데이터(20a)와 번역 수행데이터(20b) 중 어느 하나 이상의 일부 또는 전부로 이루어지는 에러검출 대상데이터에서 도면부호가 기입된 어휘들을 추출한다. 이때, 에러검출 요청은 에러검출 컴포넌트(320)가 먼저 실행된 후 사용자 단말기에 설치된 응용 프로그램의 클립보드에 에러검출 대상데이터가 복사(Ctrl+c)되는 것을 의미할 수 있다. 이러한 경우, 클립보드에 복사된 에러검출 대상데이터에서 도면부호가 기입된 어휘들을 추출한다.The error detection module 321 extracts vocabularies in which reference numerals are written from the error detection target data including some or all of one or more of the translation target data 20a and the translation performance data 20b according to the error detection request. In this case, the error detection request may mean that the error detection target data is copied (Ctrl+c) to the clipboard of the application program installed in the user terminal after the error detection component 320 is first executed. In this case, words with reference numerals are extracted from the error detection target data copied to the clipboard.

이렇게 추출된 어휘 리스트를 도 8과 같이 도면부호 및 어휘 추출 빈도수와 함께 출력한다. 여기서 어휘 추출 빈도수는 각 어휘가 추출된 횟수를 의미하는 것이다. 예를 들어, 도 8에서와 같이 번역 수행데이터(20b)에서 어휘 리스트를 추출하는 경우, 추출된 어휘 리스트 중에서 선택된 "second frames 210"은 어휘 추출 빈도수가 8인 것으로 나타난다. 이러한 결과는 "second frames 210"라는 어휘가 번역 수행데이터(20b)에서 총 8회 검색되어 추출되었음 나타낸다.The extracted vocabulary list is output together with reference numerals and vocabulary extraction frequency as shown in FIG. 8 . Here, the vocabulary extraction frequency means the number of times each vocabulary is extracted. For example, when a vocabulary list is extracted from the translation performance data 20b as shown in FIG. 8 , “second frames 210” selected from the extracted vocabulary list appear to have a vocabulary extraction frequency of 8. These results indicate that the word "second frames 210" was searched and extracted from the translation performance data 20b a total of eight times.

에러검출 확장용 모듈(322)은 에러검출용 모듈(321)에서 추출된 어휘를 중심으로 전/후 인접 단어를 더 포함하는 유사단어확인용 문자열을 추출하는데, 여기서 전/후 인접 단어의 개수는 심도(Depth) 설정에 따라 결정된다. The error detection extension module 322 extracts a string for checking similar words further including the adjacent words before/after centering on the vocabulary extracted from the error detection module 321, where the number of adjacent words before/after is It is determined according to the Depth setting.

예를 들어, 번역 대상데이터(20a)가 "과산화수소가 분해기(200)를 지나온 지점에 pH 측정기를 설치하여 안전 수치를 벗어날 경우 기기 작동이 정지하도록 할 수 있다."라는 문장을 포함한다고 할 때, 이 문장에서 추출된 어휘는 도면부호가 기입된 "분해기(200)"가 될 수 있다. 여기서 추출되는 유사단어확인용 문자열은, 심도가 1인 경우 "과산화수소가 분해기(200)를 지나온"가, 심도가 3인 경우 "과산화수소가 분해기(200)를 지나온 지점에 pH"가 될 수 있다. 즉 유사단어확인용 문자열에는 추출된 어휘 전/후에 위치되는 일련의 단어들이 포함되되, 심도 값이 클수록 더 많은 개수의 단어들이 포함된다.For example, when the data to be translated 20a includes the sentence "a pH meter may be installed at the point where the hydrogen peroxide has passed through the decomposer 200 so that operation of the device may be stopped when it deviates from a safe value." The vocabulary extracted from this sentence may be "decomposer 200" in which reference numerals are written. The string for checking similar words extracted here is "the pH at the point where hydrogen peroxide passed the decomposer 200" when the depth is 1, "hydrogen peroxide passed through the decomposer 200", and when the depth is 3, it can be. That is, the string for checking similar words includes a series of words located before and after the extracted vocabulary, but as the depth value increases, more words are included.

다음으로 에러검출 확장용 모듈(322)은 번역 대상데이터(20a) 및 번역 수행데이터(20b) 중에서 이렇게 추출된 유사단어확인용 문자열과의 매칭율이 기준매칭율보다 높은 문자열들을 추출하여 유사단어확인용 문자열 리스트를 생성하고 출력한다.Next, the error detection extension module 322 extracts strings whose matching rate with the thus-extracted string for similar word confirmation is higher than the standard matching rate from among the translation target data 20a and the translation performance data 20b to confirm similar words Creates and prints a list of strings for use.

이러한 구성을 통해, 도면부호를 기재해야하는 기술구성에 대하여 도면부호를 미처 붙이지 못한 경우와 같이, 에러검출용 모듈(321)만으로는 잡아내지 못하는 휴먼에러를 효과적으로 검출해낼 수 있다.Through such a configuration, it is possible to effectively detect a human error that cannot be caught by the error detection module 321 alone, as in the case where reference numerals are not attached to a technical configuration in which reference numerals are to be described.

용어통일 컴포넌트(330)는 번역문 전체에서 동일한 용어나 어구에 대한 번역 용어나 어구를 일치시키기 위한 보조 수단으로서, 도 9에서와 같이, 용어색인용 모듈(331) 및 상용구색인용 모듈(332)를 포함할 수 있다.The term unification component 330 is an auxiliary means for matching translated terms or phrases for the same term or phrase throughout the translation. may include

용어색인용 모듈(331)은 도 5와 같이 색인요청 용어 세트(30)를 입력받아 저장하는데, 이때 색인요청 용어 세트(30)는 제1 언어의 제1 색인요청 용어(30a)와 이와 대응하는 제2 언어의 제2 색인요청 용어(30b)로 이루어진다. 용어색인용 모듈(331)은 이렇게 저장된 색인요청 용어 세트(30)의 제1 색인요청 용어(30a)가 현재 번역자가 번역 중에 있는 번역 대상데이터(20a)에 포함되어 있는지를 확인한 후, 포함되어 있는 해당 색인요청 용어 세트(30)를 출력하여 번역자가 인지할 수 있도록 한다.The term index module 331 receives and stores the index request term set 30 as shown in FIG. 5 , wherein the index request term set 30 includes the first index request term 30a of the first language and its corresponding Consists of a second index request term 30b in a second language. The glossary index module 331 checks whether the first index request term 30a of the index request term set 30 stored in this way is included in the translation target data 20a that the translator is currently translating, The corresponding index request term set 30 is output so that the translator can recognize it.

여기서 용어색인용 모듈(331)은 현재 커서(cursor)가 위치하고 있는 번역 수행데이터(20b)와 대응되는 번역 대상데이터(20a)가 현재 번역 중에 있는 번역 대상데이터(20a)인 것으로 판단할 수 있다. Here, the glossary index module 331 may determine that the translation target data 20a corresponding to the translation performance data 20b where the cursor is currently positioned is the translation target data 20a currently being translated.

상용구색인용 모듈(332)은 용어색인용 모듈(331)와 마찬가지로, 색인요청 상용구 세트(40)를 입력받아 저장하되, 이때 색인요청 상용구 세트(40)는 제1 언어의 제1 색인요청 상용구(40a)와 이와 대응하는 제2 언어의 제2 색인요청 상용구(40b)로 이루어진다. 상용구색인용 모듈(332)은 이렇게 저장된 색인요청 상용구 세트(40)의 제1 색인요청 상용구(40a)가 현재 번역자가 번역 중에 있는 번역 대상데이터(20a)에 포함되어 있는지를 확인한 후, 포함되어 있는 해당 색인요청 상용구 세트(40)를 출력하여 번역자가 인지할 수 있도록 한다.The boilerplate index module 332, like the term index module 331, receives and stores the index request boilerplate set 40, at this time, the index request boilerplate set 40 is the first index request boilerplate of the first language ( 40a) and a second index request boilerplate 40b of the second language corresponding thereto. The boilerplate index module 332 checks whether the first index request boilerplate 40a of the stored index request boilerplate set 40 is included in the translation target data 20a currently being translated by the translator, and then The corresponding index request boilerplate set 40 is output so that the translator can recognize it.

아래에서는 본 발명의 실시예에 따른 기술문서 번역 지원 시스템의 작동 방법에 대해 설명하기로 한다.Hereinafter, an operating method of the technical document translation support system according to an embodiment of the present invention will be described.

도 10을 참조하여, 본 발명에 따른 번역 지원 시스템의 실행된 이후 사용자에 의해 번역 대상데이터(20a)의 기술분야가 입력되면, 사용자 단말기(200)에서는 입력된 기술분야에 해당되는 번역 참조데이터 세트(10)를 번역 지원용 DB(100)로부터 로드한다. 한편 기술분야를 입력받은 후 번역하고자 하는 번역 대상데이터(20a)를 로드한다.Referring to FIG. 10 , when the technical field of the translation target data 20a is input by the user after the translation support system according to the present invention is executed, the user terminal 200 sets the translation reference data corresponding to the input technical field. (10) is loaded from the DB 100 for translation support. Meanwhile, after the technical field is input, the translation target data 20a to be translated is loaded.

다음으로 번역 과정 중 사용자에 의해 검색어가 입력되면, 번역지원 컴포넌트(310)가 실행되어 입력된 검색어에 적절한 참조데이터 세트(10)를 추출하게 된다. 이때 로드된 번역 지원용 DB(100)에 원하는 참조데이터 세트(10)가 없는 경우에는 오픈 사전을 통한 검색어 검색결과가 도시되며, 원하는 참조데이터 세트(10)가 하나 이상 추출되는 경우에는 각 참조데이터 세트(10)가 번역 대상데이터(20a)와의 유사도를 기준으로 내림차순 정렬되어 출력된다. Next, when a search word is input by the user during the translation process, the translation support component 310 is executed to extract the reference data set 10 appropriate to the input search word. At this time, if there is no desired reference data set 10 in the loaded DB 100 for translation support, a search result of a search term through the open dictionary is shown, and when one or more desired reference data sets 10 are extracted, each reference data set (10) is output in descending order based on the similarity with the translation target data (20a).

다음으로 사용자에 의해 에러검출 요청이 입력되면, 에러검출 컴포넌트(320)는 도면부호가 기입된 어휘를 추출하여 어휘 리스트를 출력한다. 여기서 에러검출 확장이 선택되면 설정되는 심도에 따라 어휘 리스트에 포함되는 추출된 어휘를 중심으로 전/후 인접 단어를 포함한 유사단어확인용 문자열을 추출한다. 이후 번역 대상데이터(20a) 및 번역 수행데이터(20b) 중에서 이렇게 추출된 유사단어확인용 문자열과의 매칭율이 기준매칭율보다 높은 문자열을 다시 추출하여 유사단어확인용 문자열 리스트를 출력하게 된다.Next, when an error detection request is input by the user, the error detection component 320 extracts a vocabulary in which reference numerals are written and outputs a vocabulary list. Here, when the error detection extension is selected, a string for checking similar words including adjacent words before/after is extracted based on the extracted vocabulary included in the vocabulary list according to the set depth. Thereafter, a character string whose matching rate with the string for checking similar words extracted in this way is higher than the standard matching rate from among the translation target data 20a and the translation performance data 20b is extracted again, and a string list for checking similar words is output.

마지막으로, 검색어 또는 에러검출 요청에 대한 시스템 작동이 완료되고 번역자의 번역이 완료되면, 번역 대상데이터(20a) 및 번역 수행데이터(20b)는 새로운 참조데이터 세트(10)로서 번역 지원용 DB(100)에 저장되고, 다른 원문을 이용한 추가번역을 진행하고자 하는 경우에는 최초의 단계로 돌아가 진행된다.Finally, when the system operation for the search word or error detection request is completed and the translator's translation is completed, the translation target data 20a and the translation performance data 20b are the new reference data set 10, and the translation support DB 100 It is stored in , and if you want to proceed with additional translation using another original text, it goes back to the first step.

이러한 본 발명의 실시예에 따른 기술문서 번역 지원 시스템은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능하며, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다.The technical document translation support system according to the embodiment of the present invention can be variously modified and modified by those of ordinary skill in the art to which the present invention pertains without departing from the essential characteristics of the present invention. The scope of the technical idea of the present invention is not limited by this.

10 : 참조데이터 세트 10a : 제1 참조데이터
10b : 제2 참조데이터 20a : 번역 대상데이터
20b : 번역 수행데이터 30 : 색인요청 용어 세트
30a : 제1 색인요청 용어 30b : 제2 색인요청 용어
40 : 색인요청 상용구 세트 40a : 제1 색인요청 상용구
40b : 제2 색인요청 상용구
100 : 번역 지원용 DB 200 : 사용자 단말기
300 : 번역 지원용 어플리케이션 310 : 번역지원 컴포넌트
311 : 번역작업용 모듈 312 : 검색용 모듈
313 : 참조용 모듈 313a : 참조데이터 수집유닛
313b : 데이터 분할유닛 313c : 유사도 측정유닛
313d : 참조데이터 출력유닛 314 : 기술분류용 모듈
315 : 저장용 모듈 320 : 에러검출 컴포넌트
321 : 에러검출용 모듈 322 : 에러검출 확장용 모듈
330 : 용어통일 컴포넌트 331 : 용어색인용 모듈
332 : 상용구색인용 모듈10: reference data set 10a: first reference data
10b: second reference data 20a: translation target data
20b: translation performance data 30: index request term set
30a: first index request term 30b: second index request term
40: index request boilerplate set 40a: first index request boilerplate
40b: 2nd index request boilerplate
100: DB for translation support 200: User terminal
300: application for translation support 310: translation support component
311: module for translation work 312: module for search
313: reference module 313a: reference data collection unit
313b: data dividing unit 313c: similarity measuring unit
313d: reference data output unit 314: module for technical classification
315: storage module 320: error detection component
321: error detection module 322: error detection extension module
330: term unification component 331: terminology index module
332 : Module for boilerplate index

Claims

A reference data set 10 consisting of first reference data 10a of a first language formed as a minimum unit sentence and second reference data 10b of a second language corresponding to the first reference data 10a is divided by technical classification. DB (100) for translation support stored separated by ;
a user terminal 200 for loading a reference data set 10 in conjunction with the translation support DB 100; and
Includes; translation support application (300) installed in the user terminal (200);
The application for translation support 300,
The second language translation performance data 20b corresponding to the translation target data 20a in which the original text of the first language is divided into minimum unit sentences is input, and reference data matching the technical classification of the translation target data 20a a translation support component 310 for outputting a reference data set 10 including one or more inputted search terms from among the set 10;
and an error detection component 320 for extracting reference-signed words from at least one of the translation target data 20a and the translation performance data 20b, and outputting the extracted vocabulary list together with the reference numbers, and ,
The translation support component 310 is
a translation task module 311 that outputs the translation target data 20a and receives translation performance data 20b of a second language corresponding to the translation target data 20a;
A search module 312 for generating search word information by receiving a search word;
A reference module 313 that matches the technical classification of the translation target data 20a and searches the reference data set 10 including one or more search word information in the translation support DB 100 and outputs it;
The translation support data 20a loaded through the translation work module 311 and the translation performance data 20b input through the translation work module 311 are used as one reference data set 10 as the translation support DB Technical document translation support system, characterized in that it is configured to include a storage module (315) for storing in (100).

According to claim 1,
The error detection component 320 is
An error detection module ( 321) and
It is configured to include; an error detection extension module 322 for extracting a character string for checking similar words further including an adjacent word before and after centering on the vocabulary extracted from the error detection module 321; Technical document translation support system, characterized in that it detects technical constituent terms that do not match reference numbers or are not described in reference numbers.

3. The method of claim 2,
The error detection extension module 322 is
According to the setting of the depth, a character string for confirming a similar word further including an adjacent word before/after is extracted based on the extracted vocabulary, and the similar word among the translation target data 20a and the translation performance data 20b Technical document translation support system, characterized in that the string for confirmation and the matching rate are higher than the standard matching rate and output a list of strings for confirmation of similar words.

delete