CN108153743A - Intelligent offline translation machine based on similarity - Google Patents

Intelligent offline translation machine based on similarity Download PDF

Info

Publication number
CN108153743A
CN108153743A CN201810064998.XA CN201810064998A CN108153743A CN 108153743 A CN108153743 A CN 108153743A CN 201810064998 A CN201810064998 A CN 201810064998A CN 108153743 A CN108153743 A CN 108153743A
Authority
CN
China
Prior art keywords
word
sentence
translation
translated
chinese
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810064998.XA
Other languages
Chinese (zh)
Other versions
CN108153743B (en
Inventor
张斌
张锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiaguyi (Beijing) Language Technology Co.,Ltd.
Original Assignee
Chengdu Sea Translation Translation Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sea Translation Translation Co Ltd filed Critical Chengdu Sea Translation Translation Co Ltd
Priority to CN201810064998.XA priority Critical patent/CN108153743B/en
Publication of CN108153743A publication Critical patent/CN108153743A/en
Application granted granted Critical
Publication of CN108153743B publication Critical patent/CN108153743B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

A kind of intelligent offline translation machine based on similarity of offer of the present invention, particularly Chinese is to the intelligent offline translation machine based on similarity of English, pass through the reasonable segmentation to Chinese sentence, phase recency calculates, English create-rule, obtains satisfactory English, and network data base need not be relied on to a certain extent by realizing, only by handling Chinese sentence to be translated, accurate translation result can be also obtained in combination with the English Translation rule of setting.

Description

Intelligent offline translation machine based on similarity
Technical field
The invention belongs to automatic translation fields, and in particular to a kind of intelligent offline translation machine based on similarity.
Background technology
With the development of smart machine, intelligent operating system is also more and more diversified, such as the IOS of Apple Inc., Google The Android of company, the Firefox OS of red fox company etc., and the smart machine for being integrated with these systems is also begun to by more next More users uses, and user is played using these equipment, social, reading etc. daily routines.
And more and more opening with society, people read there are also more chances and are not belonging to oneself mother tongue Content, whether usually hobby reads or because of academic program, work requirements etc., can usually encounter many foreign language datas, mesh Method on preceding smart machine there are most common inquiring foreign language vocabulary is to open foreign language inquiry application manually by user, defeated manually Enter word enquiring, the better application ratio slightly done is if any road dictionary.Currently a popular automatic translating method mainly has three classes, the One kind is word-based, using word as the base unit of translation, does not consider that contextual information and human language are gained knowledge, translates When, the target language word corresponding to each original language word is first looked for, is inserted into, delete target language word, and adjust They whole sequence is combined into target language sentence, and feature is that translation virtuality is fast, but accuracy is poor, and the second class is based on short The translation of language, translation granularity expand to phrase from word, preferably solve local context Dependence Problem, greatly improve the stream of translation Sharp degree and accuracy rate, third class are the translations based on syntax, and syntactic structure information is introduced translation process, but need grammer knot Structure knowledge introduces, and needs to adjust original language word order using syntactic knowledge before translation, utilizes syntactic knowledge after translation It reorders.
For at present, in existing automatic translation by computer, the translation of third class is trend, however to obtain preferable translation Effect obtains syntactic structure preferably by the mode networked online, in addition, the translation speed is also relatively slow.Although internet is Obtained it is great commonly used, however, with the variation of environment and the appearance of various temporary conditions, our intelligence is set It is standby at every moment to keep presence, therefore, it is badly in need of a kind of intelligent offline translation machine based on similarity, as far as possible It realizes in the case of being detached from network, can also obtain accurate translation result.
Invention content
In view of above analysis, it is a primary object of the present invention to provide a kind of intelligence based on similarity for overcoming drawbacks described above Energy offline translation machine, particularly Chinese pass through the conjunction to Chinese sentence to the intelligent offline translation machine based on similarity of English Reason segmentation, phase recency calculate, and English create-rule obtains satisfactory English, and net need not be relied on to a certain extent by realizing Network database only by handling Chinese sentence to be translated, is obtained in combination with the English Translation rule technical ability of setting Accurate translation result.
The purpose of the present invention is what is be achieved through the following technical solutions.
A kind of intelligent offline translation machine based on similarity, which is characterized in that including:
For receiving the Chinese sentence of input, word segmentation is carried out according to Chinese-English dictionary, is obtained correct for chinese input module Word segmentation form;
Phase recency computing module, for using certain features of Chinese sentence to be translated as in querying condition to database Similar sentence is inquired, and immediate sentence is selected according to close degree size, is i.e. phase recency calculates;
Alignment module, for the alignment rule according to setting, the sentence being aligned in Chinese sentence and database to be translated, And the word of the word of Chinese sentence and english sentence in align data library;
Translation module for the English Translation rule according to setting, translates into satisfactory English.
Further, the word segmentation the specific steps are (1) sets the length of sentence to be split as L, in dictionary most Big word length is M, and the character string that length is taken to be M since first character to be split is matched.(2) if can match, Then using this character string as a word, divide from sentence, using the part at left and right sides of the word as new sentence, continue This process is repeated to be split;(3) if cannot match, continue the character string that length is taken to be M from second word of sentence and carry out Matching;(4) if cannot match, successively since in sentence third, the 4th ... take the length to be (L-M+1) a word The character string of M is matched, if successful match, is returned to (2) step, if cannot all match, is shown without length For the word of M, this season M-1, then with this string length, since first character, matched;(5) it repeats the above process, Until the word in sentence all complete by segmentation.
Further, it after the completion of word segmentation, further includes and part-of-speech tagging and special word processing is carried out to segmentation result, Part of speech ambiguity is eliminated using the semantic information and rule of semantic database, improves part-of-speech tagging accuracy rate.
Further, the phase recency calculating is included in sentence more to be translated in overall structure, extracts language to be translated The feature of sentence, sentence similar in database search is removed by the feature of extraction.
Further, the feature includes comparing part of speech, semanteme, and the connection including calculating semantic distance and word is closed System.
Further, the alignment includes setting the segmentation form of the Chinese example sentence in database, according in Chinese example sentence Chinese example sentence translation, is divided into form corresponding with Chinese terms, while store Chinese-English contrast relationship by the appearance sequence of word, During alignment, with reference to word length and above-mentioned phase recency result of calculation, consolidated according to the translation that the frequency of occurrences in Chinese-English dictionary is higher Vocabulary is determined as node, is aligned according to contrast relationship in database.
Further, it is described translate into satisfactory English and specifically include translated according to above-mentioned alignment result, such as Fruit sentence to be translated is identical with the corresponding word string of example sentence corresponding in database, then directly by the translation result in example sentence It is repeated, if sentence to be translated is different from the corresponding word string of example sentence corresponding in database, with to be translated The word that translation word in sentence corresponding to word goes to replace example sentence translation is worked as and is treated in the appropriate location for copying to new translation When cypher text is a word of multiple word alignment example sentences, then multiple words to be translated are first translated, then to be translated Multiple words replace the part that is aligned in example sentence translation as a whole, and the multiple word is translated as using the multiple Similar lexical translation is as a result, using the result as the translation result of multiple word in word inquiry database.
Technical scheme of the present invention has the following advantages:
Overcome dependence of the above-mentioned translation on line to network data base, only by handling Chinese sentence to be translated, A kind of particularly Chinese is provided to the intelligent offline translation machine based on similarity of English, passes through rationally dividing to Chinese sentence It cuts, phase recency calculates, and accurate translation result can be also obtained in combination with the English Translation rule of setting.
Description of the drawings
Fig. 1 shows the composition frame chart of translator according to the preferred embodiment of the invention.
Specific embodiment
As shown in Figure 1, a kind of intelligent offline translation machine based on similarity, including:
For receiving the Chinese sentence of input, word segmentation is carried out according to Chinese-English dictionary, is obtained correct for chinese input module Word segmentation form;
Phase recency computing module, for using certain features of Chinese sentence to be translated as in querying condition to database Similar sentence is inquired, and immediate sentence is selected according to close degree size, is i.e. phase recency calculates;
Alignment module, for the alignment rule according to setting, the sentence being aligned in Chinese sentence and database to be translated, And the word of the word of Chinese sentence and english sentence in align data library;
Translation module for the English Translation rule according to setting, translates into satisfactory English.
The word segmentation the specific steps are (1) sets the length of sentence to be split as L, and maximum word is long in dictionary It spends for M, the character string that length is taken to be M since first character to be split is matched.(2) if can match, by this word Symbol string is divided from sentence as a word, using the part at left and right sides of the word as new sentence, continues to repeat this mistake Journey is split;(3) if cannot match, the character string for from second word of sentence continuing that length is taken to be M is matched;(4) If cannot match, successively since in sentence third, the 4th ... the character that length is M is taken (L-M+1) a word String is matched, if successful match, returns to (2) step, if cannot all match, is shown without word of the length for M Language, this season M-1, then with this string length, since first character, matched;(5) it repeats the above process, until language Word in sentence all complete by segmentation.
After the completion of word segmentation, further include and part-of-speech tagging and special word processing are carried out to segmentation result, utilize semanteme The semantic information and rule of database eliminate part of speech ambiguity, improve part-of-speech tagging accuracy rate.
The phase recency calculating is included in sentence more to be translated in overall structure, extracts the feature of sentence to be translated, Sentence similar in database search is gone by the feature of extraction.
The feature includes comparing part of speech, semanteme, the connection relation including calculating semantic distance and word.
The alignment includes setting the segmentation form of the Chinese example sentence in database, according to the appearance of word in Chinese example sentence Sequentially, Chinese example sentence translation is divided into form corresponding with Chinese terms, while stores Chinese-English contrast relationship, during alignment, knot Word length and above-mentioned phase recency result of calculation are closed, vocabulary conduct is fixed according to the higher translation of the frequency of occurrences in Chinese-English dictionary Node is aligned according to contrast relationship in database.
It is described translate into satisfactory English and specifically include translated according to above-mentioned alignment result, if to be translated Sentence is identical with the corresponding word string of example sentence corresponding in database, then directly repeats the translation result in example sentence, If sentence to be translated is different from the corresponding word string of example sentence corresponding in database, with word in sentence to be translated Corresponding translation word removes the word instead of example sentence translation in the appropriate location for copying to new translation, and when text to be translated is During one word of multiple word alignment example sentences, then multiple words to be translated are first translated, then multiple words to be translated As a whole instead of the part being aligned in example sentence translation, and the multiple word be translated as inquire number using the multiple word According to lexical translation similar in library as a result, using the result as the translation result of multiple word.
Technical scheme of the present invention has the following advantages:
Overcome dependence of the above-mentioned translation on line to network data base, only by handling Chinese sentence to be translated, A kind of particularly Chinese is provided to the intelligent offline translation machine based on similarity of English, passes through rationally dividing to Chinese sentence It cuts, phase recency calculates, and accurate translation result can be also obtained in combination with the English Translation rule of setting.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention All any modification, equivalent and improvement made within refreshing and principle etc., should all be included in the protection scope of the present invention.

Claims (7)

1. a kind of intelligent offline translation machine based on similarity, which is characterized in that including:
Chinese input module for receiving the Chinese sentence of input, carries out word segmentation according to Chinese-English dictionary, obtains correct word Language divides form;
Phase recency computing module, for using certain features of Chinese sentence to be translated as being inquired in querying condition to database Go out similar sentence, and immediate sentence is selected according to close degree size, is i.e. phase recency calculates;
Alignment module, for the alignment rule according to setting, the sentence being aligned in Chinese sentence and database to be translated, and The word of the word of Chinese sentence and english sentence in align data library;
Translation module for the English Translation rule according to setting, translates into satisfactory English.
2. a kind of intelligent offline translation machine based on similarity as described in claim 1, wherein the word segmentation is specific Step is that (1) sets the length of sentence to be split as L, and maximum word length is M in dictionary, is opened from first character to be split The character string for beginning that length is taken to be M is matched.(2) if can match, using this character string as a word, divide from sentence It cuts, using the part at left and right sides of the word as new sentence, repeatedly this process that continues is split;(3) if cannot match, The character string for from second word of sentence continuing that length is taken to be M is matched;(4) if cannot match, successively from sentence The character string that third, the 4th ... (L-M+1) a word start that length is taken to be M is matched, if successful match, is returned (2) step if cannot all match, shows without the word that length is M, this season M-1, then long with this character string Degree, since first character, is matched;(5) it repeats the above process, until the word in sentence all complete by segmentation.
3. a kind of intelligent offline translation machine based on similarity as claimed in claim 2, wherein after the completion of word segmentation, is also wrapped It includes and part-of-speech tagging and special word processing is carried out to segmentation result, word is eliminated using the semantic information and rule of semantic database Property ambiguity, improve part-of-speech tagging accuracy rate.
4. a kind of intelligent offline translation machine based on similarity as described in claim 1, the phase recency calculating is included in total Sentence more to be translated in body structure, extracts the feature of sentence to be translated, goes database search close by the feature of extraction Sentence.
5. a kind of intelligent offline translation machine based on similarity as claimed in claim 4, the feature includes comparing part of speech, language Justice, the connection relation including calculating semantic distance and word.
6. a kind of intelligent offline translation machine based on similarity as described in claim 1, the alignment includes setting database In Chinese example sentence segmentation form, according to the appearance sequence of word in Chinese example sentence, Chinese example sentence translation is divided into and Chinese During alignment, knot is calculated with reference to word length and above-mentioned phase recency for the corresponding form of word, while store Chinese-English contrast relationship Fruit fixes vocabulary as node according to the higher translation of the frequency of occurrences in Chinese-English dictionary, is carried out according to contrast relationship in database Alignment.
7. a kind of intelligent offline translation machine based on similarity as claimed in claim 2, described to translate into satisfactory English Text is specifically included to be translated according to above-mentioned alignment result, if sentence to be translated is opposite with example sentence corresponding in database The word string answered is identical, then directly repeats the translation result in example sentence, if opposite in sentence and database to be translated The corresponding word string of example sentence answered is different, then is gone that example sentence is replaced to translate with the translation word corresponding to word in sentence to be translated Word in the appropriate location for copying to new translation, and when text to be translated be multiple word alignment example sentences a word when, Multiple words to be translated are then first translated, then multiple words to be translated are replaced as a whole the portion being aligned in example sentence translation Point, and the multiple word be translated as using similar lexical translation in the multiple word inquiry database as a result, with Translation result of the result as multiple word.
CN201810064998.XA 2018-01-23 2018-01-23 Intelligent off-line translation machine based on similarity Active CN108153743B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810064998.XA CN108153743B (en) 2018-01-23 2018-01-23 Intelligent off-line translation machine based on similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810064998.XA CN108153743B (en) 2018-01-23 2018-01-23 Intelligent off-line translation machine based on similarity

Publications (2)

Publication Number Publication Date
CN108153743A true CN108153743A (en) 2018-06-12
CN108153743B CN108153743B (en) 2021-12-17

Family

ID=62456777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810064998.XA Active CN108153743B (en) 2018-01-23 2018-01-23 Intelligent off-line translation machine based on similarity

Country Status (1)

Country Link
CN (1) CN108153743B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611811A (en) * 2020-05-25 2020-09-01 腾讯科技(深圳)有限公司 Translation method, translation device, electronic equipment and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1652106A (en) * 2004-02-04 2005-08-10 北京赛迪翻译技术有限公司 Machine translation method and apparatus based on language knowledge base
US20120232882A1 (en) * 2009-08-14 2012-09-13 Longbu Zhang Method for patternized record of bilingual sentence-pair and its translation method and translation system
CN106874263A (en) * 2017-01-17 2017-06-20 中译语通科技(北京)有限公司 A kind of Sino-British corpus proofreading method based on multi-dimensional data analysis and semanteme

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1652106A (en) * 2004-02-04 2005-08-10 北京赛迪翻译技术有限公司 Machine translation method and apparatus based on language knowledge base
US20120232882A1 (en) * 2009-08-14 2012-09-13 Longbu Zhang Method for patternized record of bilingual sentence-pair and its translation method and translation system
CN106874263A (en) * 2017-01-17 2017-06-20 中译语通科技(北京)有限公司 A kind of Sino-British corpus proofreading method based on multi-dimensional data analysis and semanteme

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611811A (en) * 2020-05-25 2020-09-01 腾讯科技(深圳)有限公司 Translation method, translation device, electronic equipment and computer readable storage medium
CN111611811B (en) * 2020-05-25 2023-01-13 腾讯科技(深圳)有限公司 Translation method, translation device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN108153743B (en) 2021-12-17

Similar Documents

Publication Publication Date Title
Gouws et al. Simple task-specific bilingual word embeddings
KR101130444B1 (en) System for identifying paraphrases using machine translation techniques
CN104679850B (en) Address structure method and device
CN100437557C (en) Machine translation method and apparatus based on language knowledge base
CN111539229A (en) Neural machine translation model training method, neural machine translation method and device
WO2008107305A2 (en) Search-based word segmentation method and device for language without word boundary tag
CN103116578A (en) Translation method integrating syntactic tree and statistical machine translation technology and translation device
CN107656921B (en) Short text dependency analysis method based on deep learning
CN111382571B (en) Information extraction method, system, server and storage medium
CN108804592A (en) Knowledge library searching implementation method
Abdurakhmonova et al. Linguistic functionality of Uzbek Electron Corpus: uzbekcorpus. uz
Kunchukuttan et al. Learning variable length units for SMT between related languages via byte pair encoding
CN113343717A (en) Neural machine translation method based on translation memory library
CN110705285B (en) Government affair text subject word library construction method, device, server and readable storage medium
CN117251524A (en) Short text classification method based on multi-strategy fusion
Abdurakhmonova Formal-Functional Models of The Uzbek Electron Corpus
Wang et al. Semi-supervised chinese open entity relation extraction
CN108255818A (en) Utilize the compound machine interpretation method of cutting techniques
Anju et al. Malayalam to English machine translation: An EBMT system
Islam et al. A vocabulary-free multilingual neural tokenizer for end-to-end task learning
CN108153743A (en) Intelligent offline translation machine based on similarity
CN110888940A (en) Text information extraction method and device, computer equipment and storage medium
CN108280066A (en) A kind of offline translation method of Chinese to English
Seresangtakul et al. Thai-Isarn dialect parallel corpus construction for machine translation
Devi et al. Steps of pre-processing for english to mizo smt system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20211126

Address after: Room 502, building 1, No. a, Beibinhe Road, Guang'anmenwai, Xicheng District, Beijing 100032

Applicant after: Jiaguyi (Beijing) Language Technology Co.,Ltd.

Address before: 610000 No. 10 Jiuxing Avenue, Chengdu High-tech Zone, Sichuan Province

Applicant before: CHENGDU HAIZHIYI TRANSLATION CO.,LTD.

GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: 101399 12-113, No. 2, CAIDA Second Street, Nancai Town, Shunyi District, Beijing

Patentee after: Jiaguyi (Beijing) Language Technology Co.,Ltd.

Address before: Room 502, building 1, No. a, Beibinhe Road, Guang'anmenwai, Xicheng District, Beijing 100032

Patentee before: Jiaguyi (Beijing) Language Technology Co.,Ltd.

PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Intelligent Offline Translation Machine Based on Similarity

Effective date of registration: 20230921

Granted publication date: 20211217

Pledgee: Zhongguancun Beijing technology financing Company limited by guarantee

Pledgor: Jiaguyi (Beijing) Language Technology Co.,Ltd.

Registration number: Y2023990000471