GB2096374A - Translating devices - Google Patents

Translating devices Download PDF

Info

Publication number
GB2096374A
GB2096374A GB8110483A GB8110483A GB2096374A GB 2096374 A GB2096374 A GB 2096374A GB 8110483 A GB8110483 A GB 8110483A GB 8110483 A GB8110483 A GB 8110483A GB 2096374 A GB2096374 A GB 2096374A
Authority
GB
United Kingdom
Prior art keywords
language
source
module
target
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB8110483A
Other versions
GB2096374B (en
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BAE Systems Electronics Ltd
Original Assignee
Marconi Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Marconi Co Ltd filed Critical Marconi Co Ltd
Priority to GB8110483A priority Critical patent/GB2096374B/en
Publication of GB2096374A publication Critical patent/GB2096374A/en
Application granted granted Critical
Publication of GB2096374B publication Critical patent/GB2096374B/en
Expired legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation

Abstract

A translating device translates one natural human language into another via an intermediate language, to reduce the number of dictionaries required when there are a number of possible source and target languages. <IMAGE>

Description

SPECIFICATION Translating devices This invention concerns translating devices, and relates in particular to apparatus for the automatic translation of one language into another.
With the advent of the microprocessor, coupled with the ability to store large amounts of data in a relatively small space, it has become possible to build small, fast, computer-type machines capable of translating information (such as speech or text) in one human language into another. There are presently commercially available a number of such machines; the larger are capable of full and fairly idiomatic translation of many thousands of words and expressions both on their own and in the form of whole sentences, while the smaller ones, though usually having a rather limited ability, are small enough to be held in the hand.
Though doubtless their specific modes of operation are different, these machines seem to have their general principles in common. Briefly, each machine contains in some form or other a dictionary for the two languages (for two-way translation of, say, French and German the machine needs both a French-into-German and a German-into-French dictionary), and two sets of rules relating to the syntax of the two languages concerned. in operation the machine accepts as input a word or string of words in one language (the source language), uses the syntax rules for that language to decide what sort of word it is/they are, which enables it to use the dictionary properly to ascertain the correct translation into the second language (the target language), and finally uses the syntax rules for the target language to construct a word or string of words output in that language.
This type of system suffers from at least one serious shortcoming: there have to be two dictionaries for every possible source/target pair of languages. Thus: for two languages (French and German, say) there have to be two dictionaries (French-to-German and German-to French); for three languages (French, German and Spanish, say) there have to be six dictionaries (French-to-German, German-to-French; French-to Spanish, Spanish-to-French; German-to-Spanish, Spanish-to-German); for four languages there have to be twelve dictionaries; and ten languages need ninety dictionaries. Indeed, for any number N of languages there are N !/(N-2) i=N2-N pairs of languages, requiring the same number of dictionaries. As is clearly evident, the cost and complexity of a system adapted to translate any two of even as few as ten languages is extremely high.The present invention seeks to reduce the problem, and the cost, down to manageable proportions by the application of a concept that seems to be quite new in the machine translation area, namely the translation in a first stage of each source language into a common intermediate language, followed by the translation from that intermediate language into the or each target language; using this concept, while two languages now need four dictionaries, and while three still need six, four languages only need eight (instead of twelve), and ten languages only need twenty (instead of ninety). Indeed, for any N languages there are only 2N pairs with the intermediate language (N into it, N out of it), requiring merely the same number-2N-of dictionaries.Clearly, the application of this intermediate language concept will very considerably reduce the cost and complexity of machine translation.
In one aspect, therefore, this invention provides a translation device which includes a source language module, into which there is input the source language to be translated from and in which that source language is translated into an intermediate language which is thereafter output from the source module, in association with a target language module, into which there is input the intermediate language output of the source module and in which the intermediate language is translated into the target language which is thereafter output from the target module.
The device of the invention includes a source language module and a target language module, and (as discussed hereinafter) in use these two will form part of a complete translation system including input/output means and so on. The two modules are associated together both in the sense that the output of the former is the input of the latter and in the sense that each makes use of the same intermediate language. Moreover, it will generally be the case that the two modules are physically associated (being adjacent parts of the whole translation system).It is here convenient to point out that, since it is a prime purpose of the invention to reduce the complexity and cost of translating machines, while each module may be no more than a portion of and integral with the whole machine, nevertheless it is very much preferred that each module in fact be a discrete physical entity, readily physically replaceable by some other module of the same sort (source or target, as appropriate). Indeed, it is preferred that each module be of a "plug-in" type so that it can be so physically replaced with the minimum of difficulty. Thus, if it is required to translate from the chosen source language not into the first chosen target language but into a second, then the first target language module may simply be removed and replaced by the second target language module, no other conceptual change being necessary.Similarly, if it is required to translate not from the first chosen source language but from a second, then the first source language module may simply be removed and replaced by the second source module, again without any other conceptual change being necessary.
It is the purpose of the device of the invention to translate from one language (source) into another (target) via an intermediate language. By source and target language is generally meant a natural human language (tongue) as exemplified by, say, French, German or Spanish, though it is not presently intended to exclude artificial languages, nor is it presently intended to exclude languages which are not human.
By intermediate language is generally meant much the same-a natural human language-but also not excluding artificial and/or non-human languages. However, in the case of the intermediate language there is very preferably applied what may be regarded as the second main feature of the invention, namely that in order to reduce the problems of translating into and out of the intermediate language this latter should as near as possible be a perfect language, the term "perfect language" here meaning a language that has no, or very few, irregularities of any sort.
Accordingly, since, by the very nature of their origins and development, few (if any) natural human languages are anything like perfect, the intermediate language is very preferably an aritificial language, and may be either one of those artificial human languages known now or in the future-most of these (such as Volapuk, Ido, lnterlingua, Novial and Esperanto, which latter is presently preferred) are less "imperfect" than any available popular natural human language-or a language especially designed for this purpose.
It should be noted, incidentally, that the source and target languages do not necessarily need to be different. Instead, the translating device of the invention may be used to change-to "translate"-for example the tense (say, from past to future) or the person (from singular to plural). Moreover, it should also be noted that a single translation machine could easily include more than one target language module at once, so providing simultaneous translation of a source language into several target languages.
The translation device of the invention will conveniently include appropriate input/output means, and these may be any considered suitable-for example, as input means keyboards, light pens, punched or magnetic tapes, vocoders and character readers, and as output means printers, tapes, VDUs and speech synthesizers.
The translation device of the invention is presently best realised in terms of an electronic device including the required data stores and one or more data-handling microprocessors, and the overall architecture of such a device is not too dissimilar to that of two of the presently-available translating machines in tandem. Broadly, then, the device as a whole comprises two recognisable "halves". The input half comprises: input means; a digitiser; a source language input buffer; a source language sentence and syntax analyser; a source language word analyser; a source-tointermediate language dictionary; a general store and control unit; a source-to-intermediate language syntax changer; and an intermediate language output buffer.The output half comprises: an intermediate language input buffer; an intermediate-to-target language syntax changer; and intermediate-to-target language dictionary; a general store and control unit; and output means. All these sections may briefly be explained as follows (where "SL", "IL" and "TL" are Source Language, Intermediate Language and Target Language respectively):~ For the input half The input means may, as discussed hereinbefore, be any suitable, and at present the most likely input means is a keyboard.
The digitiser is required because it is usually most convenient to handle all input in digital, specifically binary digital form, both before during and after each translation (prior to the final output).
The SL input buffer holds part of the digitised text that is to be translated. It is only required to store a portion of the input text (for example, a paragraph or a single sentence), the subsequent processing being achieved portion by portion (thus, paragraph by paragraph, or sentence by sentence) rather than as a continuous flow mechanism. Naturally, the operation of the input means is controlled in corresponding fashion; while it is possible to run the input continuously, this makes the whole system operate more slowly (as the overall rate then needs to be set according to the most time-consuming section of text to be translated that is envisaged) unless a very large buffer is used as a "smoothing" system.
Under the overall control of the general store and control circuit, the SL sentence and syntax analyser analyses the text portion held in th,e SL input buffer in conjunction with the SL word analyser which itself is operated in conjunction with the SL-to-lL dictionary. The sentence and syntax analyser and the word analyser together perform operations such as sorting out and associating coupled words (for example, auxiliary verbs and their main verbs), and also relate adjectives and adverbs with their associated words.
The output of these analytical units is in the chosen intermediate language; it is passed to the SL-to-lL syntax changer where it is converted to standard syntactic form, then stored in the IL output buffer, where it is ready to be coupled to the output half of the translating machine.
As intimated, the SL general store and control unit oversees the operation of all the sections.
For the output half The IL input buffer interfaces with the IL output buffer of the input half, and accepts therefrom the stored intermediate language text.
The lL-to-TL syntax changer re-orders the intermediate language text into the syntax appropriate to the required target language (the rules of syntax, and the positioning of auxiliary verbs, adjectives, etc., are held in the TL general store and control unit).
In conjunction with the IL-to-TL dictionary, the TL general store then produces the required target language output.
Finally, the digitised target language is passed to the output means (a printer or VDU, for example) where it is converted back into a more suitable, human-readable form.
Although the above-described electronic device of the invention may be so constructed that each language module contains all the components required, being complete in itself, it is more conveniently arranged that, while the device as a whole includes all the necessary components, the two modules include only those that are specific to the particular languages concerned. Thus, for the input half of the device the input means (except when, for example, a peculiar alphabet is involved), digitiser, SL input buffer and IL output buffer are common to all languages, and any one source language module itself need only contain the SL sentence and syntax analyser, SL word analyser, SL-to-lL dictionary, SL-to-lL syntax changer and SL general store and control unit.Similarly, for the output half of the device the IL input buffer and the output means (except when, for example, the latter requires a special alphabet) are common to all languages, and any one target language module itself need only contain the IL-to-TL syntax changer, IL-to-TL dictionary and TL general store and control unit.
The invention extends, of course, to a translating machine whenever using a modular translating device as described and claimed herein.
One embodiment of the invention is now described, though only by way of illustration, with reference to the accompanying drawings in which: the Figure shows in block diagram form a translating machine utilising the device of the invention.
The Figure itself needs no description.
However, the mode of operation of the depicted device does, and may be described as follows.~ In the design and manufacture of the two portions of the device a certain amount of software in the form of computer programming is necessary. For the purposes of this description it is assumed that this preparatory work has been done.
Input of source language 1. When the SL input buffer is vacant or nearly so the SL General Store and Control Unit (GSCU) starts, or restarts, the SL input means so as to initiate the passing of SL text into the digitiser where it is converted into a convenient digital machine code. From the digitiser the thus digitised SL text is passed into the SL input buffer.
When the SL input buffer is full, the SL GSCU stops the SL input means passing more of the SL text.
2. The first SL text sentence in the SL input buffer contents is passed to the SL sentence/syntax analyser, the individual words being passed to the SL word analyser. In the latter the words are analysed into morphemes (which are the smallest syntactic unit). Each set of morphemes constituting a word is analysed as a word, and the grammatical characteristics of the word noted against the word. This analysis is made in conjunction with the SL-to-lL Dictionary.
For example, the French: vous ne parlez pas doucement' would be divided into 'vous/ne/parl/ez/pas/dou/ce/ment' 'vous' would be found as a personal pronoun second person plural.
'ne' would be noted as part of a word or phrase.
'parl' would be recognised as a morpheme associated with speaking and could have an associated significance scale number to distinguish it from-say-causer (to chat) or crier (to shout).
'ez' would be recognised as a verb ending (second person, plural, present indicative).
'pas' would be recognised as a 'step' or as a morpheme used with a number of other words, such as 'ne'.
The SL sentence and word analyser would examine the sentence for word associations, e.g.
the 'ne' and 'pas' in the previously-used example of a sentence (but see further below).
3. In the SL-to-lL language dictionary it can be written into the initial software that some words have a "significance" scale associated with them.
For example, against the word 'doucement' (in the most general terms, "slowly") there can be a scaling number of, say, 3 to fit with a scale of, say, twelve degrees between the words associated with the slowest and fastest type of action, together with another number scaled for the degree of agitation from smooth calm to highly excited (this latter can be on a scale of, say, 10, and the word rated as 4).
Words that may be defined without ambiguity can be entered into the SL-to-lL dictionary by the use of some standard internationally-useable source, for example the Oxford English Dictionary.
The words can be codified for the intermediate language (e.g. volume number, page number, column number, word number, meaning number); the coding is required to allow for the insertion of new words, which can be achieved as an addenda with an ever-growing content.
The one standard word dictionary (e.g., O.E.D.) would be in common use as a source of definitions of all words in all the used target languages-thus, for 'horse', a word which appears to have a few common morphemes in common languages.
The SL-to-lL dictionary contains both morphemes and the grammatical characteristics of words as sets of morphemes.
4. The words (and any associated words, as "ne" and "pas") in the sentence are analysed syntactically by the SL sentence analyser. The sentence is analysed, using the language convention appropriate to the SL, by studying the positions of parts of speech and by the implied association of words that is detectable by grammatical agreement (i.e., gender, number and person). From this the form of the sentence may be detected, be it declaratory, interogative, imperative or exclamatory, and for this, and the sequence of words that are associated with the sentence form, the sentence may be analysed into subject and predicate and (foliowing the grammatical rules) into phrases and clauses.
5. The IL words produced by the SL-to-lL dictionary are now re-arranged in a formal manner to construct the intermediate language.
The SL sentence then exists as a string of lexical units taken from the IL dictionary together with their grammatical characteristics and with an expression that characterises the form of construction of the SL sentence. This IL 'text' is now passed to the output buffer.
Output of target language 1. The IL input buffer accepts (under the control of the TL GSCU) the IL sentences with the codings from the SL module output buffer.
2. The lL-to-TL syntax changer then re-orders the words, phrases, and clauses according to the standard form and requirements of the TL grammar, the rules of which are stored in the TL GSCU.
3. The lL-to-TL dictionary then translates on a word by word basis from the intermediate language to the target language.
4. The grammar of the target language is then corrected to fit the grammatical rules concerning the peculiarities of the target language.
For example: In English: I do not speak Spanish; In French: Je ne pane pas Espahol; In Spanish: No hablo Espa ol.
In the Spanish the personal pronoun 'Yo' is usually elided, except when emphasis is required.
5. Finally, with the syntax and words changed to suit the target language, the target language text is passed in digital form to the TL output, where it is reconstituted into a form more easily read by humans.

Claims (1)

  1. Claims
    1. A translation device which includes a source language module, into which there is input the source language to be translated from and in which that source language is translated into an intermediate language which is thereafter output from the source module, in association with a target language module, into which there is input the intermediate language output of the source module and in which the intermediate language is translated into the target language which is thereafter output from the target module.
    2. A translation device as claimed in claim 1, wherein the source language module and the target language module are each a discrete physical entity, readily physically replaceable by some other module of the same sort (souce or target, as appropriate).
    3. A translation device as claimed in either of the preceding claims, wherein the intermediate language is an artificial language, less "imperfect" than any available popular natural human language.
    4. A translation device as claimed in any of the preceding claims, wherein there is more than one target language module at once, so enabling simultaneous translation of a source language into several target languages.
    5. A translation device of the invention as claimed in any of the preceding claims, which comprises:~ a) an input half itself comprising: input means; a digitiser; a source language input buffer; a' source language sentence and syntax analyser; a source language word analyser; a source-tointermediate language dictionary; a general store and control unit; a source-to-intermediate language syntax changer; and an intermediate language output buffer; and b) an output half itself comprising: an intermediate language input buffer; and intermediate-to-target language syntax changer; an intermediate-to-target language dictionary; a general store and control unit; and output means.
    6. A translation device as claimed in claim 5, wherein the source language module itself only contains the SL sentence and syntax analyser, SL word analyser, SL-to-lL dictionary, SL-to-lL syntax changer and SL general store and control unit, while the target language module itself only contains the IL-to-TL syntax changer, IL-to-TL dictionary and TL general store and control unit.
    7. A translating device as claimed in any of the preceding claims and substantially as described hereinbefore.
    8. A translating machine whenever using a translating device as claimed in any of the preceding claims.
    New Claims or Amendments to Claims filed on 3 Dec 1981.
    Superseded Claim 1.
    New or Amended Claims:
    1. A human language translation device which includes a source language module, into which there is input the source human language to be translated from and in which that source language is translated into an intermediate language which is thereafter output from the source module, in association with a target language module, into which there is input the intermediate language output of the source module and in which the intermediate language is translated into the target human language which is thereafter output from the target module.
GB8110483A 1981-04-03 1981-04-03 Translating devices Expired GB2096374B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB8110483A GB2096374B (en) 1981-04-03 1981-04-03 Translating devices

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB8110483A GB2096374B (en) 1981-04-03 1981-04-03 Translating devices

Publications (2)

Publication Number Publication Date
GB2096374A true GB2096374A (en) 1982-10-13
GB2096374B GB2096374B (en) 1984-05-10

Family

ID=10520895

Family Applications (1)

Application Number Title Priority Date Filing Date
GB8110483A Expired GB2096374B (en) 1981-04-03 1981-04-03 Translating devices

Country Status (1)

Country Link
GB (1) GB2096374B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0300025A1 (en) * 1987-02-05 1989-01-25 TOLIN, Bruce G. Method of using a created international language as an intermediate pathway in translation between two national languages
EP0387226A1 (en) * 1989-03-06 1990-09-12 International Business Machines Corporation Natural language analyzing apparatus and method
EP0486017A2 (en) * 1990-11-15 1992-05-20 Canon Kabushiki Kaisha Method and apparatus for further translating result of translation
US5237502A (en) * 1990-09-04 1993-08-17 International Business Machines Corporation Method and apparatus for paraphrasing information contained in logical forms
EP0590332A1 (en) * 1992-09-28 1994-04-06 Siemens Aktiengesellschaft Method for realising an international language bond in an international communication network
FR2713800A1 (en) * 1993-12-15 1995-06-16 Gachot Jean Method and apparatus for transforming a first voice message into a first language, into a second voice message spoken in a second predetermined language.
EP1464006A1 (en) * 2001-12-21 2004-10-06 Eli Abir Multilingual database creation system and method
EP1464007A1 (en) * 2001-12-21 2004-10-06 Eli Abir Multilingual database creation system and method

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0300025A4 (en) * 1987-02-05 1990-12-27 Bruce G. Tolin Method of using a created international language as an intermediate pathway in translation between two national languages
EP0300025A1 (en) * 1987-02-05 1989-01-25 TOLIN, Bruce G. Method of using a created international language as an intermediate pathway in translation between two national languages
EP0387226A1 (en) * 1989-03-06 1990-09-12 International Business Machines Corporation Natural language analyzing apparatus and method
US5386556A (en) * 1989-03-06 1995-01-31 International Business Machines Corporation Natural language analyzing apparatus and method
US5237502A (en) * 1990-09-04 1993-08-17 International Business Machines Corporation Method and apparatus for paraphrasing information contained in logical forms
US5541837A (en) * 1990-11-15 1996-07-30 Canon Kabushiki Kaisha Method and apparatus for further translating result of translation
EP0486017A2 (en) * 1990-11-15 1992-05-20 Canon Kabushiki Kaisha Method and apparatus for further translating result of translation
EP0486017A3 (en) * 1990-11-15 1993-06-30 Canon Kabushiki Kaisha Method and apparatus for further translating result of translation
EP0590332A1 (en) * 1992-09-28 1994-04-06 Siemens Aktiengesellschaft Method for realising an international language bond in an international communication network
WO1995016968A1 (en) * 1993-12-15 1995-06-22 Jean Gachot Method and device for converting a first voice message in a first language into a second message in a predetermined second language
FR2713800A1 (en) * 1993-12-15 1995-06-16 Gachot Jean Method and apparatus for transforming a first voice message into a first language, into a second voice message spoken in a second predetermined language.
EP1464006A1 (en) * 2001-12-21 2004-10-06 Eli Abir Multilingual database creation system and method
EP1464007A1 (en) * 2001-12-21 2004-10-06 Eli Abir Multilingual database creation system and method
EP1464006A4 (en) * 2001-12-21 2006-05-24 Eli Abir Multilingual database creation system and method
EP1464007A4 (en) * 2001-12-21 2006-05-24 Eli Abir Multilingual database creation system and method

Also Published As

Publication number Publication date
GB2096374B (en) 1984-05-10

Similar Documents

Publication Publication Date Title
Tiedemann Recycling translations: Extraction of lexical data from parallel corpora and their application in natural language processing
US5220503A (en) Translation system
Hajič Complex corpus annotation: The Prague dependency treebank
Aqlan et al. Arabic–Chinese neural machine translation: Romanized Arabic as subword unit for Arabic-sourced translation
Maučec et al. Slavic languages in phrase-based statistical machine translation: a survey
Mengliyev et al. The morphological analysis and synthesis of word forms in the linguistic analyzer
JP2815714B2 (en) Translation equipment
Ortega et al. Using morphemes from agglutinative languages like Quechua and Finnish to aid in low-resource translation
GB2096374A (en) Translating devices
Hans et al. Improving the performance of neural machine translation involving morphologically rich languages
Hettige et al. A multi-agent solution for managing complexity in english to sinhala machine translation,”
Akeel et al. ANN and rule based method for english to arabic machine translation.
Kituku et al. Towards Kikamba computational grammar
Bladier et al. RRGparbank: A parallel role and reference grammar treebank
Iftene et al. Named entity recognition for Romanian
US11216617B2 (en) Methods, computer readable media, and systems for machine translation between Arabic and Arabic sign language
JP7247460B2 (en) Correspondence Generating Program, Correspondence Generating Device, Correspondence Generating Method, and Translation Program
Tolegen et al. A finite state transducer based morphological analyzer for the kazakh language
Fenogenova et al. Automatic morphological analysis on the material of Russian social media texts
Winge Automatic annotation of Latin vowel length
JPS6190269A (en) Translation system
Myint et al. Chunk Tagged Corpus Creation for Myanmar Language
Beneš Processing of translations between languages: software methods, artificial intelligence and their advantages and disadvantages
Winiwarter Incremental learning of transfer rules for customized machine translation
Girma Bi-directional Amharic–Afaan Oromo Machine Translation Using Statistical Approach

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee