CN105335359A - Term extracting method used for translation teaching system - Google Patents

Term extracting method used for translation teaching system Download PDF

Info

Publication number
CN105335359A
CN105335359A CN201510792918.9A CN201510792918A CN105335359A CN 105335359 A CN105335359 A CN 105335359A CN 201510792918 A CN201510792918 A CN 201510792918A CN 105335359 A CN105335359 A CN 105335359A
Authority
CN
China
Prior art keywords
translation
length
minimum
data
language material
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510792918.9A
Other languages
Chinese (zh)
Inventor
张马成
王兴强
屈耕
熊易
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHENGDU URELITE INFORMATION TECHNOLOGY Co Ltd
Original Assignee
CHENGDU URELITE INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHENGDU URELITE INFORMATION TECHNOLOGY Co Ltd filed Critical CHENGDU URELITE INFORMATION TECHNOLOGY Co Ltd
Priority to CN201510792918.9A priority Critical patent/CN105335359A/en
Publication of CN105335359A publication Critical patent/CN105335359A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a term extracting method used for a translation teaching system. The term extracting method comprises the following steps: opening an extracted translation file; manually setting the maximum length and the minimum length of translation linguistic data and the minimum appearance frequency of the translation linguistic data; if the translation file is Chinese or Japanese, extracting data by applying Pangu participles; if the translation file is English, Russian or German, extracting data by applying an exhaustion method; comparing the extracted data with the set maximum length, the minimum length of the translation linguistic data and the minimum appearance frequency of the translation linguistic data; and when the length of the extracted data ranges from the maximum length of the translation linguistic data to the minimum length of the translation linguistic data and the number of times of appearing reaches the minimum frequency, adding the term into a term bank; and displaying the added extracted data in the term bank. According to the scheme, vocabularies can be selectively and continuously supplemented according to the requirements of different customers into the term bank by using the method through the principle, so that the defect that the vocabularies of a translation bank cannot store words of all the fields is overcome, the specialization is relatively strong and the flexibility is relatively high.

Description

For the term extracting process of translation teaching system
Technical field
The present invention relates to translation teaching field, particularly, relate to the term extracting process for translation teaching system.
Background technology
Translation teaching system is the teaching experiment platform researched and developed in conjunction with college teaching pattern based on Transmate enterprise version, focus on the interaction of teachers and students, student is by translation teaching system study understanding CAT technology on the one hand, the operating mode of simulation learning translation company on the other hand, thus be that more practical talent is cultivated by society, promote the professional ability of college student, strengthen graduate employment competitiveness.In translation teaching system, teacher can also arrange translation jobs to student, student also can directly fulfil assignment in tutoring system, comprise the independent translation to syntagma in original text in the middle of operation, and same English word there will be the multiple difference meaning, needs to translate in conjunction with linguistic context when judging.Vocabulary in existing tutoring system in terminology bank is changeless, and the vocabulary in terminology bank always has careless omission unavoidably, and As time goes on, the appearance of new term, and cannot call in terminology bank, cause final translation inaccurate.
Summary of the invention
Technical matters to be solved by this invention is to provide the term extracting process for translation teaching system, utilize this kind of method terminology bank optionally constantly can enrich vocabulary according to the needs of different client, overcome the defect that every field vocabulary cannot be received completely in translation storehouse vocabulary, professional stronger, dirigibility is higher.
The present invention's adopted technical scheme that solves the problem is: for the term extracting process of translation teaching system, comprise the steps:
A) extraction translated document is opened;
B) minimum frequency that translation language material maximum length, minimum length and translation language material occur artificially is set;
C) translated document for Chinese or Japanese then use Pan Gu's participle extract data, or translated document be English, Russian German time then use the method for exhaustion extract data;
D) minimum frequency that extraction data and the translation language material maximum length of setting, minimum length and translation language material occur is compared, the length of extraction data between translation language material maximum length and minimum length and occurrence number reach minimum frequency just this term is added in terminology bank, already present in terminology bank, need not add;
E) in terminology bank, show the extraction data of interpolation.
Method of exhaustion extraction data are wherein used to be that all objects are enumerated out, again one by one phrase is partitioned into it, or use the method for Pan Gu's participle extraction data, the minimum frequency that maximum length, minimum length and translation language material that the number of times of the length of phrase and appearance and translation language material are arranged occur is compared, when the length of phrase between translation language material maximum length and minimum length and occurrence number reach minimum frequency just this term is added in terminology bank, already present in terminology bank, need not add.Use can be called out when the later stage translates for the vocabulary increased in terminology bank, overcome the defect that every field vocabulary cannot be received completely in translation storehouse vocabulary, terminology bank vocabulary can grow with each passing hour, more relative words can be increased according to the difference of user's vocabulary used, professional stronger, dirigibility is higher.
Step C) in Pan Gu's participle extraction data be utilize Chinese and English participle assembly to realize, be phrase language material Fast Segmentation.Can identify the word that some do not occur in dictionary, resolution is high facilitates the later stage to enrich terminology bank.
Step B) in the minimum frequency that occurs of translation language material maximum length, minimum length and translation language material be positive number.
To sum up, the invention has the beneficial effects as follows: utilize this kind of method terminology bank can according to the translation of different client's different field need optionally enrich vocabulary, dirigibility is strong, overcome the defect that every field vocabulary cannot be received completely in translation storehouse vocabulary, along with the increase of different client service time, specialized vocabulary also can constantly be enriched, and it is professional stronger that the later stage calls that terminology bank carries out translating.
Embodiment
Below in conjunction with embodiment, to the detailed description further of the present invention's do, but embodiments of the present invention are not limited thereto.
Embodiment 1:
The present invention includes the term extracting process for translation teaching system, comprise the steps:
A) extraction translated document is opened;
B) minimum frequency that translation language material maximum length, minimum length and translation language material occur artificially is set;
C) translated document for Chinese or Japanese then use Pan Gu's participle extract data, or translated document be English, Russian German time then use the method for exhaustion extract data;
D) minimum frequency that extraction data and the translation language material maximum length of setting, minimum length and translation language material occur is compared, the length of extraction data between translation language material maximum length and minimum length and occurrence number reach minimum frequency just this term is added in terminology bank, already present in terminology bank, need not add;
E) in terminology bank, show the extraction data of interpolation.
Method of exhaustion extraction data are wherein used to be that all objects are enumerated out, again one by one phrase is partitioned into it, or use the method for Pan Gu's participle extraction data, the minimum frequency that maximum length, minimum length and translation language material that the number of times of the length of phrase and appearance and translation language material are arranged occur is compared, when the length of phrase between translation language material maximum length and minimum length and occurrence number reach minimum frequency just this term is added in terminology bank, already present in terminology bank, need not add.Use can be called out when the later stage translates for the vocabulary increased in terminology bank, overcome the defect that every field vocabulary cannot be received completely in translation storehouse vocabulary, terminology bank vocabulary can grow with each passing hour, more relative words can be increased according to the difference of user's vocabulary used, professional stronger, dirigibility is higher.The minimum frequency of translation language material maximum length wherein, minimum length and appearance is positive integer, the large I of its value is arranged according to the actual needs of user, minimum length is traditionally arranged to be 1, and maximum length is traditionally arranged to be between 3-5, and the frequency of occurrences generally arranges 3.
Embodiment 2:
The present embodiment is preferably as follows on the basis of embodiment 1: step C) in Pan Gu's participle extraction data be utilize Chinese and English participle assembly to realize, be phrase language material Fast Segmentation.Can identify the word that some do not occur in dictionary, resolution is high facilitates the later stage to enrich terminology bank.
Step B) in the minimum frequency that occurs of translation language material maximum length, minimum length and translation language material be positive number.
The above is only preferred embodiment of the present invention, and not do any pro forma restriction to the present invention, every any simple modification, equivalent variations done above embodiment according to technical spirit of the present invention, all falls within protection scope of the present invention.

Claims (3)

1., for the term extracting process of translation teaching system, it is characterized in that, comprise the steps:
A) extraction translated document is opened;
B) minimum frequency that translation language material maximum length, minimum length and translation language material occur artificially is set;
C) translated document for Chinese or Japanese then use Pan Gu's participle extract data, or translated document be English, Russian German time then use the method for exhaustion extract data;
D) minimum frequency that extraction data and the translation language material maximum length of setting, minimum length and translation language material occur is compared, the length of extraction data between translation language material maximum length and minimum length and occurrence number reach minimum frequency just this term is added in terminology bank, already present in terminology bank, need not add;
E) in terminology bank, show the extraction data of interpolation.
2. the term extracting process for translation teaching system according to claim 1, is characterized in that, step C) in Pan Gu's participle extraction data be utilize Chinese and English participle assembly to realize, be phrase language material Fast Segmentation.
3. the term extracting process for translation teaching system according to claim 1, is characterized in that, step B) in the minimum frequency that occurs of translation language material maximum length, minimum length and translation language material be positive number.
CN201510792918.9A 2015-11-18 2015-11-18 Term extracting method used for translation teaching system Pending CN105335359A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510792918.9A CN105335359A (en) 2015-11-18 2015-11-18 Term extracting method used for translation teaching system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510792918.9A CN105335359A (en) 2015-11-18 2015-11-18 Term extracting method used for translation teaching system

Publications (1)

Publication Number Publication Date
CN105335359A true CN105335359A (en) 2016-02-17

Family

ID=55285904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510792918.9A Pending CN105335359A (en) 2015-11-18 2015-11-18 Term extracting method used for translation teaching system

Country Status (1)

Country Link
CN (1) CN105335359A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101354712A (en) * 2008-09-05 2009-01-28 北京大学 System and method for automatically extracting Chinese technical terms
CN101685455A (en) * 2008-09-28 2010-03-31 华为技术有限公司 Method and system of data retrieval
US20100106481A1 (en) * 2007-10-09 2010-04-29 Yingkit Lo Integrated system for recognizing comprehensive semantic information and the application thereof
CN103885939A (en) * 2012-12-19 2014-06-25 新疆信息产业有限责任公司 Uyghur-Chinese bi-directional translation memory system construction method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100106481A1 (en) * 2007-10-09 2010-04-29 Yingkit Lo Integrated system for recognizing comprehensive semantic information and the application thereof
CN101354712A (en) * 2008-09-05 2009-01-28 北京大学 System and method for automatically extracting Chinese technical terms
CN101685455A (en) * 2008-09-28 2010-03-31 华为技术有限公司 Method and system of data retrieval
CN103885939A (en) * 2012-12-19 2014-06-25 新疆信息产业有限责任公司 Uyghur-Chinese bi-directional translation memory system construction method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
周邵钧等: "基于多策略融合的专利术语自动抽取", 《计算机应用与软件》 *
孙乐等: "平行语料库中双语术语词典的自动抽取", 《中文信息学报》 *
李敏等: "一个多线程全文检索系统的构建", 《长江大学学报(自然科学版)》 *
李秀英: "术语与机器翻译- 实验结果分析与术语数据库的构建", 《实验室研究与探索》 *
贾秀玲等: "一种本体学习中分类关系提取方法的研究", 《计算机技术与发展》 *

Similar Documents

Publication Publication Date Title
Yu et al. Chinese spelling error detection and correction based on language model, pronunciation, and shape
US20190043504A1 (en) Speech recognition method and device
US20160163309A1 (en) Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods
TWI567569B (en) Natural language processing systems, natural language processing methods, and natural language processing programs
CN111144102B (en) Method and device for identifying entity in statement and electronic equipment
CN105225657A (en) Polyphone mark template generation method and device
CN111178098B (en) Text translation method, device, equipment and computer readable storage medium
CN102193646A (en) Method and device for generating personal name candidate words
Depecker How to build terminology science
Gutkin et al. FonBund: A library for combining cross-lingual phonological segment data
CN109002454B (en) Method and electronic equipment for determining spelling partition of target word
CN110956043A (en) Domain professional vocabulary word embedding vector training method, system and medium based on alias standardization
CN105335359A (en) Term extracting method used for translation teaching system
CN113239151B (en) Method, system and equipment for enhancing spoken language understanding data based on BART model
US10755594B2 (en) Method and system for analyzing a piece of text
CN109446537B (en) Translation evaluation method and device for machine translation
CN113988047A (en) Corpus screening method and apparatus
CN112966510A (en) Weapon equipment entity extraction method, system and storage medium based on ALBERT
CN111090720A (en) Hot word adding method and device
CN104699670A (en) File splitting method and device
CN112528680A (en) Corpus expansion method and system
Cohn et al. Variation in two patterns of word-initial deletion in Jakarta Indonesian: Insight from naturalistic data
Ginting et al. TRANSLATION METHODS IN READER’S DIGEST MAGAZINE BY STUDENTS’OF ENGLISH DEPARTMENT AT HKBP NOMMENSEN UNIVERSITY
Sukriana Comparing the Lexicon between Palaluar sub-Dialect and Standard Minangkabau language
Garcıa et al. Obtaining parallel corpora for multilingual spoken language understanding tasks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 610000 B, building 4, building 200, Tianfu five street, Chengdu hi tech Zone, Sichuan,

Applicant after: Chengdu excellent translation information technology Limited by Share Ltd

Address before: 610000, No. 1, building 107, 1 West Bauhinia Road, Chengdu hi tech Zone, Sichuan, 6

Applicant before: Chengdu Urelite Information technology Co., Ltd.

COR Change of bibliographic data
RJ01 Rejection of invention patent application after publication

Application publication date: 20160217

RJ01 Rejection of invention patent application after publication