GB1596411A - Translation system - Google Patents

Translation system Download PDF

Info

Publication number
GB1596411A
GB1596411A GB34030/77A GB3403077A GB1596411A GB 1596411 A GB1596411 A GB 1596411A GB 34030/77 A GB34030/77 A GB 34030/77A GB 3403077 A GB3403077 A GB 3403077A GB 1596411 A GB1596411 A GB 1596411A
Authority
GB
United Kingdom
Prior art keywords
word
clause
sentence
information
encoded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
GB34030/77A
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHADIN HOJIN KYODO TSUSHINSHA
Original Assignee
SHADIN HOJIN KYODO TSUSHINSHA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHADIN HOJIN KYODO TSUSHINSHA filed Critical SHADIN HOJIN KYODO TSUSHINSHA
Priority to GB34030/77A priority Critical patent/GB1596411A/en
Publication of GB1596411A publication Critical patent/GB1596411A/en
Expired legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/157Transformation using dictionaries or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/53Processing of non-Latin text

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Description

(54) A TRANSLATION SYSTEM (71) We, SHADAN HOJIN KYODO TSUSHINSHA, also known as KYODO NEWS SERVICE, a Japanese body corporate, of 2, Akasaka Aoicho, Minato-ku, Tokyo, Japan, do hereby declare the invention, for which we pray that a patent may be granted to us, and the method by which it is to be performed, to be particularly described in and by the following statement: Abstract of the Disclosure A translation system is provided for converting a Roman sentence which represents in Roman alphabet of the phonetical reading of a Japanese language into a kanji and kana compounded sentence which is usually used in Japan. A first encoded signal indicative of the Roman sentence is converted into a second encoded signal representing a kana sentence corresponding to the Roman sentence. The second encoded signal is then converted into a third encoded signal representing a kanji and kana compounded sentence corresponding to the kana sentence. In the present invention, complicated problem associated with the specific nature of the Japanese language which occurs during the conversion from the second to the third signal is overcome.
Field and Background of the Invention This invention relates to a system for translating a Japanese expression represented in terms of Roman alphabet (Latin alphabet) into an ordinary expression represented in terms of the normal Japanese language.
An exDression renresented in the normal Japanese language employs five kinds of characters including 1) kanji (which is usually translated as "Chinese character", but it should be understood that "Kanji" Is different from Chinese character in the true sense of the word. A phonetical reading is employed herein as indicated under the paragraph "Definition".), 2) hiragana and 3) katakana (as phonetically read. See "Definition".), 4) Roman alphabetical characters and 5) Arabian figures. Of these characters, the kana which generical!y includes hiraganas and katakanas and the kanji are most frequently used in Japanese expressions, their frequency of use being approximately 63% and 36%, respectively, according to Report 1972 of National Japanese Language Research Institute of Japan which gives an estimate of the frequency of use of various characters in standard Japanese sentences. For this reason, a Japanese sentence is commonly referred to as a kanji and kana compounded sentence. A kana, inclusive of hiragana and katakana, is a syllable character (see "Definition"), and there are nearly 75 characters for each of the hiragana and katakana, though this figure varies in accordance with the interpretation of the concept "character". However, about five characters of them are very infrequently used. A kanji is a logograph character (see "Definition") and there are as many as tens of thousands kanjis. However, only 1850 characters are permitted for use in official documents. The number of kanji characters, usually appearing on newspapers is on the order of 2500 characters, and will be nearly 5000 characters when characters which are only infrequently used are included.
Because the kanji and kana compounded sentence includes a number of characters, when such sentence is to be transmitted by telecommunication, it is converted into a CO-59 code formulated by the assignee of the present invention in which a character is represented by a pair of trains each comprising six bits.
(Such code is common referred to as Kanji code, but will be quoted as Japanese code for the purpose of convenience, since code can be utilized to encode kanji, hiragana, katakana, Arabian figure, Roman alphabet and other symbols).
Alternatively, a kanji and kana compounded sentence may be rewritten into a kana sentence before the latter is transmitted on a communication channel. In this instance, a kana code employing a six bit train for each character, for example, a coding scheme defined in Japan Industrial Standards JISC-0803, may be used. Because a Japanese character cannot be directly encoded into a five bit code which is employed in the international communication, a kanji and kana compounded sentence is written into an expression, commonly referred to a Roman expression of Japanese language, using a Roman alphabet (International Telegraph Alphabet No. 2). Thus, to maintain a communication between a govermental organization or a civil enterprise and its oversea branch or office, it is necessary to perform a translation work for converting a Roman expression transmitted from an oversea correspondent into a kanji and kana compounded sentence. However, the word is very troublesome because a Roman expression is not easily legible. Such work amounts to an enormous degree in the press where the quantity of communication with oversea offices is high and a high accuracy is demanded for the conversion to assure the reliability of articles. The scale of the required labor could be appreciated by the demand of telegraphs and Telexes with oversea correspondents which amounted to nearly 29.1 billion Japanese yens, for one way communication, about one half of which will be spended for Roman expressions.
In these circumstances, it will be understood that it is a national task to provide a system which receives Roman expressions transmitted in the international code into output kanji and kana compounded sentences by conversion of the international code into the Japanese code. On the other hand, where kanji and kana compounded sentences are transmitted in the form they are encoded according to the Japanese code, it is necessary to operate a very complicated keyboard having a number of character keys and shift keys.
Therefore, it is an urgent task to provide a kana code keyboard having a reduced number of keys and which permits the transmission of a kanji and kana compounded sentence to its correspondent.
Definition To help the understanding of the present invention, the terminology will be defined or explained below. However, some special terms will be defined where they appear.
Japanese language: a single language usually used in Japan.
Roman expression or sentence of Japanese language: a sentence represented in Roman alphabet of the phonation of a Japanese language; in particular a sentence divided into clauses.
Kanji: this is commonly translated into an English expression "Chinese character" but is different from the actual Chinese character. For this reason, it is denoted herein as "Kanji" which represents a phonetical expression. A kanji has been generated in Japan, accepting characters used in the Chinese language either directly or partly modified, and is a logograph developed from hieroglyph and ideograph. It is to be understood that the kanji includes Japanese numerals, as distinguished from the Roman numerals. Commonly, a kanji has two syllables. A particular kanji is denoted herein by a square bracket [ ] in which a corresponding English translation is inserted. Hence. whenever the square brackets I I appears, it should be understood that it represents a kanii or a plurality of kanjis having the connotation of the translation.
Hiragana: this is a syllable character which is formulated simplifving a kanji of the particular writing style. A particular hiragana is denoted herein by parentheses ( ) in which a denotation lised in the attached Table 2 is entered. It should he understood that such manner of denoting hiragana is merely for the purpose of convenience, and that whenever such a denotation appears in parcnthcses. it represents a corresponding hiragana.
Katakana: this represents a syllable character similarly formed by simplirying a spelling of a kanji, and generally has a one to one corresponding with a hiragana except for a very few characters. A particular katakana is denoted herein by (( )), double parentheses, in which phonetically equivalent English or other foreign language is entered. Therefore, whenever (( )) appears, it should be understood that it represents a katakana having the phonation in the Japanese fashion which corresponds to the word entered therein.
Logograph: a character having a significance or meaning.
Syllable character: a character having one syllable as a component of a word.
Usually a single syllable character represents one syllable.
Independent word: a word signifying a concept by itself and represents an object with respect to a subject. It is capable of constituting a component of a sentence alone.
Adjunctive: a word representing various manners of the subject with respect to the object, and is capable of expressing a specific thought only when it is combined with an independent word.
Clause: a minimum unit of a sentence which is punctuated by the natural pronunciation. It may comprise only independent words or may comprise a combination of an independent word and an affix such as prefix or postfix or a combination of an independent word and an adjunctive. (See subparagraph (7) of the paragraph "Denotation of Japanese language" for detail). It is to be noted that a clause may refer herein to a particular clause itself or a code signal indicative of a particular clause, namely, an assembly of coded signals indicative of individual characters appearing in the clause.
Roman sentence divided into clauses: a Roman sentence comprising a succession of Roman alphabet characters corresponding to the phonation of each clause followed one after another with a one character space between adjacent clauses. Generally each clause has a kanji word which is placed at the top of a clause. A clause may also refer to a portion of a train of the characters between two adjacent spaces even though they may not be denoted according to the rules described above. As with the "clause", this may refer to a coded signal representing a Roman sentence divided into clauses.
Affix: this is not used alone but is added to the top or the end of a word to extend or modify the meaning of the word. A prefix is added to the top while a postfix is added to the end of a word.
Stem: the part of an inflected word that remains unchanged or which cannot be decomposed any further without the loss of the meaning. It is to be noted that the Japanese language has the nature of the agglutinative language in the similar manner as the Finnish or Turkish language.
Denotation of Japanese Language To help understanding the present invention, some features relating to the denotation of Japanese language will be briefly described below.
(I) Characters As mentioned above, a Japanese language employ five kinds of characters, namely, kanji, hiragana, katakana, Roman alphabet and Arabian figures.
(2) Denotation element When a minimum block in a character train which cannot be splitted any further from the standpoint of phonetical succession is referred to as a denotation element, a single character of kanji represents one denotation element though there are very few exceptions. A kana, inclusive of both hiragana and katakana, most frequently forms one denotation element with a single kana character, but sometimes two or more kana characters form together a single denotation element.
(3) Denotation symbols A small circle for emphasis, underscoring, parentheses, punctuation, question mark, exclamation mark, space, indentation and other symbols are used generally in the similar manner as in the English language.
(4) The orientation of character trains A Japanese language can be written in either vertical or horizontal rows.
Except for the use of vertical rows in the press, magazines and literature, writing in the horizontal rows is increasingly used. When writing in vertical rows, the characters proceed from above to down while the rows proceed from right to left.
When writing in the horizontal rows, the characters proceed from left to right while the rows proceed from up to down.
(5) Use of various kinds of characters Kanji: principly used to indicate the concept of an independent word.
Hiragana: used as pronoun, inflection, auxiliary verb, particle and for other phonetical expression.
Katakana: used to represent a foreign language, a foreigner, a foreign land, the name of animals and plants, onomatopoeic words and for the similation or other phonations.
Latin alphabet: used as the abbreviation for the metering system such as MKS or CGS units, and the abbreviation to represent a proper noun appearing in the foreign language such as an agency, formed by the initial words.
Arabian figures: used to represent the amount or quantity principally used when writing the sentences in the horizontal rows, but may be used in vertical rows also.
(6) Relationship between character trains and phonetic series.
Taking the English language, for example, there are several character series which correspond to a single phonetic series as indicated below.
character train meaning phonetic series Cite - (Cite) \ Site - (Site) - Sait Sight - (Sight) / In the Japanese language, there are the following patterns which represent such relationship.
(a) A plurality of character seriesHsingle meaning}single phonetic series.
(b) A single character trainHsingle meaning plurality of phonetic series.
(c) A single character train plurality of meaningsta plurality of phonetic series.
(d) A plurality of character trains plurality of meaningsf a single phonetic series.
(This corresponds to the example given above).
A homonym represented by the pattern (d) is frequently found with kanji which is a logograph.
(7) Composition of words in a sentence Several typical construction of words in clauses of the Japanese language are given below. In the following examples, the terms "pre-word" or "post-word" are coined to represent a word capable of being catenated with the head or end of an independent word.
a. Prefix+pre-word+independent word+post-word+postix+a group of adJunctives. SCRAp a. Prefix + pre-word + independent word + post-word + postfix + a group of adjunctives.
b. Prefix + independent word + post-word + postfix + a group of adjunctives.
c. Prefix + independent word + post-word + a group of adjunctives.
d. Prefix + independent word + post-word.
e. Prefix + independent word + a group of adjunctives.
f. Prefix + independent word.
g. Pre-word + independent word + post-word + postfix + a group of adjunctives.
h. Pre-word + independent word + post-word + a group of adjunctives.
i. Pre-word + independent word + a group of adjunctives.
j. Independent word + post-word + postfix + a group of adjunctives.
k. Independent word + post-word + a group of adjunctives.
1. Independent word + post-word.
m. Pre-word + post-word + postfix + a group of adjunctives.
n. Pre-word + post-word.
o. Independent word + postfix.
p. Independent word.
Summary of the Invention It is a general object of the invention to provide a system for translating a Japanese expression represented in terms of a specific kind of characters (syllable characters) into a Japanese expression using a different kind of characters including logograph characters.
It is a specific object of the invention to provide a translation system which receives an input signal indicative of a Roman sentence of Japanese language and converts or translates it into a kanji and kana compounded sentence which is commonly used in Japan, to be derived as an output.
It is another object of the invention to provide a translation system having a simple adaptor which permits a kana sentence of Japanese language as an input to be converted into a kanji and kana compounded sentence.
It is a further object of the invention to provide a translation system in which the formulation of an input sentence is not limited to any special denotation rule, but any input sentence formulated according to a standard Japanese denotation rule (division of a sentence into clauses) can be accepted.
It is still another object of the invention to provide a translation system which facilitates an accurate selection of homonyms or homographs required when translating syllable characters (Roman alphabet and kana) into logograph characters (kanji).
The fundamental operation of the translation system according to the invention can be briefly summarized as follows: A Roman sentence of Japanese language which is divided into clauses is once converted into a kana sentence, and (after clauses or words belonging to a predetermined group are excluded), a retrieval is made, from a independent word group registered or stored in a vocabulary memory, of independent words having a phonation which entirely or partly correspond to a character train in the respective clauses of the converted kana sentence. The retrieval takes place by searching for a word having a phonation which fully or partly corresponds to a particular character train in the clause in a manner such that a scan of the memory is initially made to find a word which fully corresponds to the character train of a particular clause, and if the scan fails, an additional scan or scans are made to find any word having the phonation which corresponds to a modified character train which is formed by successively removing one character from the end of the original or the previous character train. To enable such a retrieval, a vocabulary information including grammatical data such as the part of speech, the variety (indicative of independent word, adjunctive or affix), classification, meaning and concept or the like of the particular word as well as the Japanese code of that word is registered in the vocabulary memory together with an index word which comprises a kana representation of that word. It will be seen that when a word is to be retrieved having the phonation which fully or partly corresponds to the character train of the particular clause, this means that vocabulary information is retrieved by an index word which fully or partly corresponds to the character train whenever such index word is present.
When a vocabulary information is retrieved having an index word which partly corresponds to the character train of a particular clause, a scan of an adjunctive memory is made to find an adjunctive which corresponds to the remaining characters or character train of that clause. Adjunctive words are registered in the adjunctive memory in the same manner as the words are registered in the vocabulary memory. The retrieval from the adjunctive memory takes place in the same manner as that made from the vocabulary memory. When the scan fails to retrieve an adjunctive information having an index word which coincides with the remaining characters or character train, the remaining characters or character train is again fed to the vocabulary memory where it is used for retrieval of a vocabulary information in the same manner as mentioned above. When a vocabulary information is retrieved from the vocabulary memory having an index word which partly coincides with the remaining characters or character train, those characters or character train which has been removed during the scan of the second vocabulary information is again fed to the adjunctive memory to find an adjunctive information from the adjunctive memory. By repeating the retrieval of the vocabulary information and adjunctive information in this manner, the particular clause will be evaluated either as a single word, a combination of a plurality of words, a combination of a single word and adjunctive word or words or a combination of a plurality of words and adjunctive word or words. For these character trains or characters, there will follow corresponding vocabulary information and/or adjunctive information.
However, when a particular clause comprises a combination of a plurality of words, an examination is made if the word which has been evaluated as the initial word can be grammatically catenated with the next word, by comparing the part of speech information contained in the vocabulary information of that word against a dictionary stored in a dictionary memory. If the comparison fails, the evaluation of that word is tried again. The dictionary defines parts of the speech which can be grammatically catenated with each part of the speech. When the two initial words in a particular clause cannot be catenated together by a consideration of the parts of speech, the last character in the character train from which the initial word is evaluated is removed therefrom to establish a new character train, which is then used for the retrieval of a vocabulary information having an index word which fully or partly coincides with the new character train. Then an adjunctive information having an index word which fully or partly coincides with the remaining character train of the clause is retrieved from the adjunctive memory. The procedure is repeated until a second word is found which can be grammatically catenated with the initial word.
When it is determined that a particular clause comprises a single word or a combination of a plurality of words which can be catenated together as considered from the standpoint of the grammar, another examination is then made to see if the particular clause can be grammatically catenated with the adjoining clause. This examination takes place by collating the part of speech and/or the variety data contained in the vocabulary or adjunctive information of the last word in the particular clause and the part of speech and/or the variety data contained in the vocabulary or adjunctive data of the initial word of the adjoining clause against the dictionary.
After these examinations, an examination is made to see if the construction of words in the entire particular clause is proper as considered from the standpoint of combination of the varieties, by comparing the variety data in the vocabulary or adjunctive information of the respective words against a second dictionary stored in a second dictionary memory.
When a series of grammatical examinations are completed, the index words and associated information of the individual words of the respective clauses are deleted, leaving the Japanese codes of the information, thus allowing the respective words to be represented by the kanji's or kana's of the Japanese code.
One of the features of the translation system according to the invention resides in the fact that subsequent to the conversion of a Roman sentence into a kana sentence, when a retrieval of words is made which fully or partly correspond to the character trains of the individual clauses to derive the vocabulary or adjunctive information which includes the index words, represented in kanas and indicative of these words, and which also includes the grammatical information and the Japanese code of these words, the words are substituted by the kanji's or kana's of the Japanese code after the grammatical catenation capability between individual words and between individual clauses is examined in accordance with the grammatical information. The grammatical information may include any suitable data such as the part of speech, the variety, the classification, the meaning or the like. The association of a plurality of different kinds of grammatical information with the individual words permits an examination of the grammatical catenation capability from various aspects, enabling a choice of a correct word from a number of homonyms.
In a preferred embodiment of the invention, a Roman sentence is encoded by the international code (covering Roman alphabet, Arabian figures and symbols) in which a character is represented by a set of five bits. When converting the Roman sentence into a kanji and kana compounded sentence, one bit is uniformly added to the international code to form a buffer code or pseudo international code in which a character is represented by a set of six bits, for conversion into a kana sentence to a six bit character kana code which covers hiragana, katakana, Roman alphablet, Arabian figures and symbols. After a series of processing steps are performed on the kana sentence which is represented in the kana code, the kana sentence is converted into a kanji and kana compounded sentence according to the Japanese code which covers kanji, hiragana, katakana. Roman alphabet, Arabian figures and symbols and in which a character is represented by two sets each comprising six bits. However, it is to be understood that the buffer code and kana code may be modified to accommodate for the number of bits of any newly proposed Japanese codes in which a character is represented by two trains each comprising seven bits or eight bits. A code listed in J SC-0803 or any other coding scheme may be used for the kana code.
1t is to be understood that the formulation of any kana code has no direct beanng with the present invention and is within the skilled of those versed with the art. Additionally, where the system of the invention receives a Roman sentence, a hiragana sentence or katakana sentence in which the characters are encoded by six bits per character but according to an encoding scheme other than the buffer code, a code converter may be provided to convert such input sentence into the buffer code.
Brief Description of the Drawing and the Tables Figs. Ia, 2a, 3a and 4a are specific examples of input sentences fed to the system of the invention; Figs. Ib, 2b, 3b and 4b are output sentences derived by the system of the invention in response to the input sentences shown in Figs. Ia to 4a; Fig. 5 is a schematic block diagram illustrating the general arrangement of the system according to the invention; Fig. 6 is a block diagram of the input stage shown in Fig. 5; Fig. 7 is a block diagram of the reception and reader section shown in Fig. 6; Fig. 8 is a block diagram of the input sentence identification section shown in Fig. 6; Fig. 9 is a block diagram of the input sentence splitting section, the specific.
cipher-to-final output word conversion section and the specific word-tointermediate or final output word conversion section shown in Fig. 6; Fig. 10 is a block diagram of the Roman character-to-kana conversion stage shown in Fig. 5; Fig. 11 is a block diagram of the spliting and combining section shown in Fig.
10; Fig. 12 is a block diagram of the functional word processing section shown in Fig. 10; Fig. 13 is a block diagram of the numeral and symbol processing section shown in Fig. 10; Fig. 14 is a block diagram of the Roman character-to-kana conversion section shown in Fig. 10; Fig. 15 is a block diagram of the kana-to-kanji conversion stage shown in Fig.
5; Fig. 16 is a block diagram of the kana clause input section and the clause splitting section shown in Fig. 15; Fig. 17 is a block diagram of the vocabulary memory shown in Fig. 15; Fig. 18 is a block diagram of the independent word/adjunctive catenation examining section shown in Fig. 15; Fig. 19 is a block diagram of the independent word/non-adjunctive catenation examining section shown in Fig. 15; Fig. 20 is a block diagram of the word construction determination section in Fig. 15; Fig. 21 is a block diagram of the meaning information examining section shown in Fig. 15; Fig. 22 is a block diagram of the two word determination section shown in Fig.
15; Fig. 23 is a block diagram of the kana-to-kanji conversion section shown in Fig. 15; Fig. 24 is a block diagram of the output stage shown in Fig. 5; Fig. 25 is a block diagram showing a fragmentary modification of the system of the invention; and Table I shows the correspondence between Roman characters and hiragana's; Table 2 shows the relationship between hiragana characters and their corresponding denotation characters which are used herein for the convenience of the description; Table 3 shows the relationship between input symbols and their corresponding output symbols; Table 4 represents the relationship between certain input words and their corresponding output words; Table 5 shows a portion of a table of grammatical information stored in the grammatical information memory; Table 6 shows a portion of a table of vocabulary information stored in the vocabulary memory; Table 7 shows a portion of table of adjunctive information stored in the grammatical information memory; and Tables 8, 9, 10, 11, 12, 13, 14, 15 and 16 are tables which illustrate specific examples of vocabulary information stored in the vocabulary memory.
Detailed Description of Preferred Embodiment The present invention relates to a general system for converting a Roman sentence of Japanese language our a kana sentence of Japanese language into a kanji and kana compounded sentence. In the embodiment to be described below, the invention will be described in terms of a system for converting a Roman sentence of Japanese language into a kanji and kana compounded sentence which may be utilized in the press. The conversion of a kana sentence of Japanese language into a kanji and kana compounded sentence will be described later under the paragraph 8 "Modification".
I. Input and output sentences (I) Input sentences A messa e sent by an oversea correspondent to his principal office located in Japan is usually written in a form which can be directly carried on to a newspaper as an article. Thus, as illustrated in Figs. Ia to 4a, an input sentence I fed to the system of the invention is formulated in a given format. As shown, the input sentence 1 includes a header 2, text 3 and trailer 4, each of which is represented by Roman alphabet, Arabian figures and predetermined symbols the use of which are admitted in the international communication. The header 2 includes a beginning of message cipher 2a, an address cipher 2b, an origination cipher 2c, a message number 2d, a message date 2e, a message priority cipher 2f, a title 2g of the content contained in the text 3 (written in Roman sentence of Japanese language), and a correspondent or originator name 2h. The text 3 is written as Roman sentences of Japanese language. The trailer 4 includes a word 4a which indicates whether the message is terminated by the end of the text 3 or followed by the next message, and also includes an end of message cipher 4b.
The input sentence I is encoded into a five bit coded signal by a keyboard unit of a known form, and the coded signal is fed to the system of the invention through a suitable communication channel or by way of a suitable recording medium. The system is generally shown in Fig. 5 where it will be noted that it comprises an input stage 10, a Roman character-to-kana conversion stage 20, a kana-to-kanji conversion stage 30, and an output stage 4 which includes a suitable form df piinter capable of printing a kanji and kana compounded output sentence 6 in accordance with the input sentence 1 which is spelled in Roman characters.
(2) Functional words used in the formulation of transmission of the input sentence The system is designed to permit the following functional words to be used in the input sentence: "LX" Added to the erfd of a word which is to be ultimately converted into katakana (see paragraph "Denotation of Japanese Language"). The functional word "LX" is added by the system when a word spelled in the original foreign language (other than the Roman expression of the Japanese language) is used, and need not be added upon entry.
Added to the end of a word or abbreviation which is to be ultimately represented in Roman characters (see paragraph 5 of "Denotation of Japanese Language").
Added to the end of a numeral which is to be ultimately represented in Arabian figures.
"XX" Added to a clause or word which is misspelled by an inadvertent operation of the keyboard and which is to be cancelled. This functional word may comprise more than two X's in succession.
"XXX" This functional word comprising three X's in succession is added one space after the end of an inputted clause when the deletion of that clause is found desirable after tabbing the space. The addition of additional "XXX" with one character space following the initial "XXX" permits the deletion of a preceding clause. In the similar manner, a third preceding sentence can be deleted.
"LINEDEL" Added with one character space to the end of a line whenever clauses in that line is to be deleted entirely.
"PARA" Added to -the end of a line before a line feed with a space corresponding to one or more characters whenever an indentation is desired.
A space at the beginning of a line and corresponding to more than two characters.
Alternative method to provide an indentation (3) Output sentences Output sentences 6 corresponding to the input sentences 1 shown in Figs. la to 4a are shown in Figs. Ib to 4b. In the examples shown, the output sentences are formatted to a newspaper article, and thus the output sentences 6 are in vertical rows (see paragraph 4 of "Denotation of Japanese Language"). It will be noted that the number of characters per row is chosen to be 15 which represents the maximum number of characters printable in one row of a standard vertical column of a Japanese newspaper. The translation of the individual output sentences 6 is given below.
Fig. Ib Foreign News WO103 241045YY Increasing Dependency on Middle East Likely [Washington 24th Kyodo Correspondent Naka] The joint Atomic Committee of the Upper and Lower Houses of the United States has published a report relating to the perspective of the energy situation of the United States for the future ten years on 23rd.
Based on the assumption that the annual increase rate of the energy demand will be between 2.8 and 3% for successfully achieving the energy saving without degrading the living standards and the economical growth rate to undesirable levels, the report forecasted that "the import of petroleum from the Middle East and the North Africa for the future two years will increase by an amount from 20% to 50% of the total petroleum import in 1974].
Fig. 2b Foreign News BR124 171225YY Unemployment [Brussels 17th Kyodo Correspondent Inouel The European Community (EC) Committe has published the gross figure of unemployment within the district as of the end of April.
It indicates the gross figure of unemployment of 5,194,950, a reduction of nearly 190 thousands as compared against march (5,387,067).
With a return to prosperity, the figure of unemployment in the entire EC has continually tended to decrease from the beginning of this year while a slight increase is observed in Luxemburg in April. Uniform reduction is observed in other countries, about one hundred thousands in West Germany and about forty thousands in France.
As compared with the same season of the last year, there is still an increase in the employment as much as 750 thousands, but the unemployment will undoubtedly reduce below the level of five milions in May and June if the rate of [reduction.phenomenonl continues.
Fig. 3b Foreign News LO243 131420YY All British Open Golf [London 13th Kyodo Correspondent Tsudal Suzuki, the first Japanese player who acquired the tenth rank in the all British Open Golf of 1976 which has the brilliant history of 105 games, has returned home in high spirits on 12th. He told that "ranking in the 10th place inspired me a fighting spirit with the golf. I am surprised at the enthusiasm with which the British people play the golf. I would like to challenge the four major tournaments of the world and games of PGA "PRO GOLFER'S ASSOCIATION".
Fig. 4b Foreign News NY147 162015YY Petroleum Crisis [New York 16th Kyodo Correspondent Fujital A conference of party leaders has been held on the petroleum crisis in the State of Florida in recognition of the risk that the one-sided actions of the underdeveloped countries may bring the worldwide economy into confusion.
Debate has been made on avoiding to cross the dangerous bridge, "the issue of economical assistance" until the development of a substitute energy for petroleum to establish [setup or system] to settle down the petroleum prices.
It appears that to get through the "transitional period" without troubles, a conclusion has been reached to "make some concession" to the debt of assistance and primary products in return to the stabilization of the energy issue. (Continued.) 2. Input stage (1) Summary Referring to Fig. 6, the input stage 10 shown in Fig. 5 comprises a reception and reader section 11, an input sentence identification section 12, an input sentence splitting section 13, a specific cipher-to-final output word conversion section 15, and a specific word-to-intermediate or final output word conversion section 14. The principal function of the input stage 10 is to receive an input sentence in the form of a Roman sentence of Japanese language in which each character is encoded by a set of five bits according to the international code, to convert the five bit signal into a six bit signal, to decode a cipher of a particular group in the input sentence into a Japanese code in the form of two sets of six bits representing a final output word defined by the cipher, and to convert a word of a particular group in the input sentence into a corresponding kana code by six bits representing an intermediate or final output word having the meaning corresponding to the word. The term "final output word" refers to a word in the form of a character or letter which belongs to the character family of the final output. The term "intermediate output word" refers to a word having the same phonation as the final output word but is formed by the character or letter of a different character family. In this connection, it is to be noted that the terms "letter", "character", "word", "vocabulary", "clause", "phrase", "sentence" and "information" or any specific example thereof which appears in quotation marks " " either refer to themselves to a single indicative ot such term. Wherever a confusion between the two uses is likely, a reference in the latter form will not be made. However, the both use will be made frequently as far as it does not interfere with the understanding of the invention.
(2) Reception and reader section (Fig. 7) The reception and reader section 11 comprises a five bit paper tape reader 85 and a reception circuit 86, so that it is capable of receiving both on an on-line and an off-line input. An on-line input is applied to the reception circuit 86 from a suitable communication channel, and is then fed to a serial-parallel converter 82 and thence to one bit addition circuit 83. An off-line input is supplied to the tape reader 85 in the form of encoded perforations in a paper tape. The reader 85 has a reader circuit 81 which reads the encoded tape and feeds the read signal to the one bit addition circuit 83. A signal from either channel is applied to the one bit addition circuit 83 as parallel digital signals representing the five bits, and is converted therein into a parallel six bit signal, which will be referred to herein as "pseudo international code" to distinguish over the five bit signal formatted according to the international code. The six bit signal is fed to an output register 84 to be supplied to the input sentence identification section 12.
(3) Input sentence identification section (Fig. 8) The input sentence identification section 12 receives one input sentence starting with the beginning of message cipher 2a and ending with the end of message cipher 4b, eliminating noises which precede or follow the input sentence.
The signal from the output register 84 of the section Il is fed to an input register 91.
and is then fed therefrom to both a character distinguishing circuit 92 and an extraneous character removal circuit 93. which is responsive to a command from the circuit 92 to eliminate noises interposed between the input sentences. In this manner only effective signals are fed through an output register 94 to the input sentence splitting section 13.
(4) Input sentence splitting section (Fig. 9) The input sentence splitting section 13 comprises an input register 101, a distinguishing and splitting circuit 102, and a six bit paper tape punch 109. When a signal is inputted through the input register 101 to the distinguishing and spliting circuit 102, the latter checks if the input sentence I is formatted in a given form and bypasses the input sentence to the tape punch 109 if it has an improper format, but supplies the signal to the specific cipher-to-final output word conversion section 15 whenever the input sentence has a proper format. The signal fed to the conversion section 15 begins with the beginning of message cipher 2a and ends with the priority cipher 2f. However, that portion of the signal which begins with ":" (colon) which immediately follows the priority cipher 2f and extending to the end of the message cipher 4b is fed to the specific word-to-intermediate or final output word conversion section 14.
(5) Specific cipher-to-final output word conversion section (Fig. 9) This conversion section decodes the address cipher 2b and the origination cipher 2c which appear in the header 2 of the input sentence. Specifically, if a signal of the pseudo international code which is indicative of the address cipher "GAI" is supplied as input, the conversion section provides as an output a signal of Japanese code employing two sets of six bits which represents the final output word (Foreign Newsl defined by this cipher. When a signal of the pseudo international code indicative of the origination cipher "WO" is supplied as input, the corresponding output is a signal of Japanese code which represents the output word ((WASHINGTON)) defined by the cipher. The cipher-to-final output word conversion section 15 comprises a cipher determination circuit 106, a cipher storage circuit 107 and an output register 108, as shown in Fig. 9. The cipher storage circuit 107 stores information corresponding to predetermined address ciphers and information corresponding to predetermined origination ciphers in an alphabetical sequence, for example, as follows: Address information GAIEIKX [Foreign News] SHAEIKX [Social] Origination information LOKX ((LONDON)) WOKX ((WASHINGTON)) In the examples given above, the initial three letters of the address information and the initial two letters of the origination information represent their index word.
The character "123" in the address information represents a functional word indicating the deletion of this character as well as the index word which precedes it during a processing to be described later. The letters "KX" represent a functional word which indicates that the data which follows these letters is encoded in the Japanese code during a subsequent processing. The functional word "KX" is used only within the system, and is not used during the input entry. The index word and the function words "eel" and "KX" are denoted by Roman alphabet and symbols of the kana code. As mentioned previously under the paragraph "Definition", the square brackets i 1 indicates the presence of a kanji, without bracket, which has the meaning corresponding to the word or words indicated therein. Similarly, the denotation of the double parentheses (( )) indicates the presence of a katakana, without double parentheses, which has the phonation corresponding to a Japanese pronunciation of the word or words entered therein.
That portion of the input sentence 1 fed from the distinguishing and splitting circuit 102 of the input sentence splitting section 13 which begins with the beginning of message cipher 2a and ends with the priority cipher 2f is successively supplied, cipher by cipher, to the cipher determination circuit 106 and the cipher storage circuit 107. When a cipher not stored in the storage circuit 107 is inputted, the determination circuit 106 directly passes that cipher to the output register 108.
However, when a cipher stored in the storage circuit 107 is inputted, the determination circuit 106 receives information from the storage circuit 107 to substitute it for the cipher. By way of example, considering the input sentence shown in Fig. Ia, when "GAI" appearing in the second line is inputted, this cipher has an address information "GAIsKX [Foreign News]" stored in the storage circuit 107, so that the address information is supplied to the determination circuit 106. The circuit 106 operates to delete the index word "GA I" together with "eel" if the information supplied from the storage circuit 107 contains "us', and feeds "KX [Foreign News]" which follows "eel" to the output register 108 instead of the input "GAI". When the cipher "WO" is inputted, the storage circuit 107 retrieves the origination information "WOKX ((WASHINGTON))" having the index word "WO", and supplies the information to the determination circuit 106. Since the information "WOKX ((WASHINGTON))" does not contain "#", the circuit 106 does not operate to delete the index word, but feeds this information to the output register 108 instead of the input "WO". The message number, the message date and the priority cipher which follow the origination cipher are not stored in the storage circuit 107, and hence are directly fed the output register 108.
(6) Specific word-to-intermediate or final word conversion section (Fig. 9) The conversion section 14 converts those words of a particular group, which appear in the portion of the input sentence I which begins with ":" (colon) following the priority cipher 2f and ends with the end of message cipher 4b, into intermediate or final output words. As mentioned previously, an intermediate output word refers to a word having the same phonation as the final output word, but which is represented by a letter of a different character family from that associated with the final output word. The relationship between the words of the particular group and the converted words is given below.
a. The name of foreigners and foreign lands as well as foreign languages, principally English language, which are used as borrowed words or words of foreign origin, are converted into a Roman spelling corresponding to a Japanese phonation of the original word. (Conversion into an intermediate output word).
Example ENERGY ~ ENERUGII AFRICA -* AFURIKA b An abbreviation is converted into a final output word.
Example EC european Community OECD OECD c. The name of correspondent or originator 2h is converted into a final output word.
Example NAKA ~ [NAKA] d. The word 4a indicating whether the text is terminated or continued is converted into a final output word.
Example MORE ~ [MORE] END ~ [END] Referring to Fig. 9, the conversion section 14 comprises a specific word determination circuit 103, a specific word storage circuit 104 and an output register 105. It will be noted that its general arrangement is similar to the specific cipher-tofinal output word conversion section 15. The circuit 104 stores information corresponding to predetermined specific words in an alphabetical sequence as follows: AFRICAlxrA FURICA LX ECISIKX european Community] ENERGYENERUGIILX ENDKX [END] MOREQKX [MORE] NAKAEIKX [NAKAI OECD3OECDVX PERCENTPAASENTOLX It will be noted that these information contains the functional words "" and "KX" in the similar manner as the cipher information stored in the circuit 107, and these functional words serve the same purpose as before. However, the additional functional words "LX" and "VX" are used. These functional words "LX" and "VX" have been previously described under the paragraph of "lnput and output sentences", and are automatically added to appropriates words which are not added to the input sentence. The functional word "LX" indicates that information having this functional word added to the end thereof has a katakana as the final output word for the word located between "#" and "LX". The functional word "VX" indicates that information having this functional word added to the end thereof has Roman characters as the final output word for a word located between "eel" and "VX". These index words and functional words are denoted by alphabet or symbols of the kana code and the information proper is similarly denoted by alphabet of the kana code except when it appears after "KX".
The conversion section 14 operates substantially in the similar manner as the specific cipher-to-final output word conversion section 15, and its operation will be illustrated with reference to the input sentence I shown in Fig. Ia. A portion of the signal which begins with ":" (colon) following the priority cipher 2f of the input sentence I and which ends with the end of message cipher 4b is fed, clause by clause, from the distinguishing and splitting circuit 102 of the input sentence splitting section 13 to the specific word determination circuit 103 and to the specific word storage circuit 104. When information corresponding to the inputted clause cannot be retrieved from the storage circuit 104, the determination circuit 103 passes that clause to the output register 105. However, when such information is retrieved from the storage circuit 104, it substitutes that information for the clause. Specifically, when "NAKA" representing the name of the correspondent and appearing in the fourth line is inputted to the determination circuit 103 and the storage circuit 104, information "NAKAKX [NAKA]" is retrieved from the storage circuit 104 since the name of the correspondent is previously stored therein. This information is fed to the determination circuit 103, which operates to delete the index word "NAKAls" since it contains "eel", and supplies "KX [NAKA]" of Japanese code to the output register 105 instead of "NAKA" represented in the pseudo international code. Subsequently, the individual clauses of the test 3 is inputted, and whenever the inputted clause is found not to be a specific word, it is fed to the output register 105. When the clause "ENERGY" appearing in the sixth line is fed to the determination circuit 103 and the storage circuit 104, a registered information "ENERGY)ENERUGIILX" is retrieved from the latter and is supplied to the determination circuit 103. Because the information contains "s", the determination circuit 103 deletes the index word "ENERGYI" and feeds only "ENERUGIILX" to the output register 105. After clauses which are not stored in the storage circuit 104 are sequentially fed through the determination circuit 103 to the output register 105 "PERCENT" appearing in the twelfth line is inputted. It is fed to the register 105 in the form "PAASENTOLX" by a procedure which is similar to those described above. When the clause "MORE" or "END" indicating the end of the text is fed to the determination circuit 103 and the storage circuit 104, the storage circuit 104 responds thereto by supplying "MORElilKX [Continued]" or "ENDIXIKX [Terminated]" in Japanese code to the determination circuit 103. Since it contains "eel", the circuit 103 deletes the index word "MORE" or "'END, and supplies "KX [Continued]" or "KX [Terminatedl" to the output register 105.
3. Roman alphabet character-to-kana conversion stage (I) Summary Referring to Fig. 10, Roman character-to-kana conversion stage 20 comprises a splitting/combining section 21, a functional word processing section 22, a numeral and symbol processing section 23 and a hiragana conversion section 24. An input sentence reaching the conversion stage 20 has its specific ciphers and specific words converted into final or intermediate output words of either Japanese or kana code in the preceding input section 10, and the numerals and symbols in the input sentence are converted into final output words of the Japanese code before any remaining clause represented in Roman alphabet characters is converted into hiragana of the kana code in the conversion stage 20. The conversion stage 20 also performs the identation and the deletion of words or lines in accordance with the instruction defined by the functional words contained in the input sentence.
(2) Splitting/combining section (Fig. II) The splitting/combining section 21 receives signals from the output register 108 of the specific cipher-to-final output word conversion section 15 and from the output register 105 of the specific word-to-intermediate or final output word conversion section 14, both shown in Fig. 9, and either splits or combines a particular clause or clauses in the input sentence in order to facilitate the subsequent conversion of Roman characters into kana. The splitting of a particular clause or the combining the clauses will be briefly summarized.
Initially considering the splitting of a clause, a check is initially made to see if a numeral or any one of predetermined symbols appear in a sentence. If a numeral or such symbol appears in the clause, that clause is splitted at a point between the numeral or symbol and its immediately adjacent Roman character, automatically entering a space thereat.
By way of example, when the following clauses of the input sentence I shown in Fig. 2a are inputted to the splitting/combining section 21, its output is converted as indicated below. (Note in particular the location of the space).
Input clause Output clause 17HI 17HI (fifth line) 16H1, -, 16 Hl, (fifth line) 4GATUNI or 4 GATUNI (thirteenth line) Considering next the combining of clauses, if a space is inadvertently entered before the functional word "LX", "VX" in the input sentence to provide a demarcation unintentionally, they are combined together to eliminate the space for the purpose of subsequent processing (see 13th line of Fig. 2a).
Input clause Output clause RUKUSENBURUKU LXDEWA -* RUKUSENBURUKULXDEWA When a space is interposed between a numeral and a word of the input sentence, they are combined together into one clause for the purpose of subsequent numeral processing. In the example of Fig. 2a, this takes place as follows: Input clause Output clause 519 MAN 4950NIN -, 519MAN4950 NIN 538 MAN 7067 538MAN7067 The Roman sentence which is subjected to the splitting and/or combining step in the splitting/combining section 21 is then fed to the functional word processing section 12. Ii will be understood that clauses which do not require the splitting or combining step in the section 21 are directly fed to the processing section 22.
Referring to Fig. II, the splitting/combining section 21 will be more specifically described. As shown, it comprises an input register Ill, a space determination and addition circuit 112, a space identification circuit 113, a two word register 114, an "LX", "VX", and "NX" determination circuit 115, an "LX", "VX" and "NX" identification circuit 116 and an output register 117.
The function of these circuits will be described by referring to the exemplary sentence given in Fig. 2a. The input sentence which has been partly processed by passing through the conversion sections 14 and 15 is fed into the circuit shown in Fig. II, clause by clause, beginning with the beginning of message cipher 2a. When the signal "17HI" appearing in the fifth line of the sentence is inputted into the input register 111, that clause is fed to the space determination and addition circuit 112 and the space identification circuit 113. If a space is absent between the numeral and the alphabet as illustrated by this signal, the space identification circuit 113 detects the absence of the space from the combination of the numeral and alphabet, issuing an instruction to the space determination and addition circuit 112 to insert a space. In response thereto, the circuit 112 assembles the clause into the form "17 HI", which is then fed to the two word register 114. The clause "16HI" appearing in the same line of the input sentence is similarly treat
(3) Functional word processing section (Fig. 12) This section either converts or deletes part of the signal of the input sentence whenever an indentation signal or a word or line deletion signal is contained in the input sentence 1. Such operation will be described more fully with reference to Fig.
12.
a. Indentation processing The present apparatus is arranged so that whenever ".PARA" appears at the end of a line as shown on the eight line of the input sentence shown in Fig. I a or a space corresponding more than two characters exists at the beginning of a line, as illustrated in the eleventh line of Fig. 2a, a line feed takes place in the output sentence, with the next clause appearing on the new line with one-character space at the beginning of this line.
An indentation processing section 22a comprises an input register 121, a carriage return line feed (hereinafter abbreviated as "CRLF")/space determination and addition circuit 122, a ".PARA", ".CRLF"/space identification circuit 123, and an output register 124. One of the instructions for the indentation processing is given by inserting the function word ".PARA" at the end of a line, as illustrated by "HAPPYOSITA . PA RA" appearing in the eighth line of the input sentence I shown in Fig. Ia. The splitting/combining section 21 has already added a space before and after "." (period) of this functional word, so that an input to the input register 121 is in the form of "HAPPYOSITA . PARA". Since the indentation processing section 22a is designed to process the functional words "." (period) and "PARA" only, the clause signal "HAPPYOSITA" passes through the circuit 122 and is fed to the output register 124. However, when"." (period) is inputted, it is passed through the input register 121 to the ".CRLF"/space determination and addition circuit 122 and ".PARA", ".CRLF"/space identification circuit 123. The signal indicating "." (period) is temporarily stored in the both circuits, and subsequently when "PARA" is fed through the input register 121 to the circuits 122 and 123, the identification circuit 123 identifies ".PARA", suPplying an instruction to the circuit 122 to substitute "KX [.CRLF/space]" for the clause "PARA", which is fed to the output register 124. In this manner, an indentation takes place for the next clause to bring it to the beginning of the next line when formatting an output message from the final stage.
Another instruction for the indentation processing may be given by the insertion of a space corresponding to more than two characters at the beginning of a line. Illustrating this by way of example of the line which follows the clause "HETTA" appearing in the tenth line of the sentence shown in Fig. 2a, the clause "HETTA" is passed from the input register 121 through the circuit 122 to the output register 124 without any processing applied thereto. Though not shown in this sentence, when the signal "CRLF", "SPACING" or "SPACING---" is fed through the input register 121 to the circuits 122, 123, the circuit 123 identifies this signal to supply an instruction to the circuit 122 to substitute "KX [.CRLF/space]" for the input signal to be supplied to the output register 124.
b. Deletion of word or line.
In the present system, whenever an inputted word is to be deleted as a result of an inadvertent operation, the word to be deleted is directly followed by more than one "X's" in succession e.g. "XX" or is followed by a space accompanying "XXX". By way of example, considering "HAPPYOS XXX" appearing in the seventh of the input sentence shown in Fig. Ia, the signal is transmitted, but "HAPPYOS" is deleted at the output of the present system. When it is desired to delete two consecutive words or clauses, a repetition of "XXX" with a space therebetween enables the deletion of such words, as exemplfied by "KAGIKAKKO KONG N /// XXX XXX", which is effective to delete "N" and "///".
When a whole line is to be deleted, the functional word "LINEDEL" is to be inserted after the line to be deleted. By way of example, "SANGATU (538.-, 7067 8/LINEDEL" appearing in the eighth line of the sentence shown in Fig. 2a is transmitted, but "SANGATU (538.-, 7067 8r' will be deleted from the output of the system. Where a plurality of lines are to be deleted, the functional word "LINEDEL" is repeated in succession a number of times corresponding to the number of the lines to be deleted.
Referring to Fig. 12, the circuit arrangement will be more specifically described. A word or line deletion section 22b comprises an input register 125 adapted to be connected with the identation processing section 22a, an XXX.LINEDEL determination circuit 126, an XXXLINEDEL identification circuit 127, a one message temporary store circuit 128 and an output register 129.
The one message temporary store circuit 128 is capable of temporarily storing one message beginning with the beginning of message cipher 2a and ending with the end of message cipher 2a of each input sentence, and permits the message which is temporarily stored to be fed back to the XXXLINEDEL determination circuit 126.
clause by clause.
Considering the input sentence shown in Fig. Ia for the description of the operation, the successive clauses of the input sentence are accumulated in the temporary store circuit 128 through the input register 125 and the circuit 126.
Assume that "IYINKAIWOXX" appearing in the fifth line of the sentence is fed through the input register 125 to the circuits 126 and 127. The circuit 127 identifies "XX" in this clause, supplying a clause deletion instruction to the determination circuit 126, whereby the circuit 126 is operative to delete "IYINKAIWOXX".
When the clause "HAPPYOS" appearing in the seventh line of the sentence is accumulated in the temporary store circuit 128, and then "XXX" is supplied from the input register 125 to the circuits 126 and 127, the circuit 127 identifies"XXX' supplying an instruction to the temporary store circuit 128 to cause it to feed the clause "HAPPYOS" which precedes "XXX" back to the determination circuit 126. The circuit 126 responds thereto by automatically deleting "HAPPYOS" and "XXX". When "XXX" is supplied from the input register 125 to the determination circuit 126 and the identification circuit 127 after the clause "Nl//" appearing in the thirteenth line is fed from the input register 125 to the temporary store circuit 128, the identification circuit 127 identifes "XXX", supplying an instruction to the temporary store circuit 128 to cause it to feed back "///" to the determination circuit 126, whereby s//rz and "XXX" are both deleted. Subsequently when "XXX" is supplied from the input register 125 to the circuits 126 and 127, the identificaton circuit 127 identifies "XXX" in a similar manner to cause the temporary store circuit 128 to feed "N" back to the determination circuit 126 for deleting both "N" and "XXX". In this manner, "NINENNKAN XXX" appearing in the fourteenth line and "AKIRAKANISITA XXX" appearing in the eighteenth line of the sentence are both deleted. When the end of message cipher 4b of the input sentence 1 is fed to the temporary store circuit 128, a transfer of the signal to the following numeraUsymbol processing section 23 is initiated, beginning with the beginning of message cipher 2a.
On the other hand, when a line delete instruction is contained in the input sentence as shown in Fig. 2a, the section 22 operates as follows: The clauses "17ill XXX" appearing in the fifth line of this input sentence is automatically deleted as mentioned prevously, and the clauses "CRLF SANGATU (538.-, 7067 8 /" appearing in the seventh and eighth lines and which have been already processed by the splitting/combining section 21 are fed from the input register 125 to the temporary store circuit 128 in succession. When the following clause "LINEDEL" is supplied from the input register 125 to the circuits 126, 127, the identification circuit 127 identifies it, causing the clauses beginning with "CRLF" to be fed back to the determination circuit 126 where the entire line "CRLF SANGATU (538.-, 7067 8/ LINEDEL" is automatically deleted in one step. When the end of message cipher 4b is supplied to the temporary store circuit 128, a transfer of the signal to the output register 127 is initiated, beginning with the beginning of message cipher 2a.
(4) Numeral and symbol processing section (Fig. 13) Numerals and symbols in the input sentence are already processed by splitting them or combining them with adjacent words or symbols in the preceding splitting/combining section 21. Subsequently, the symbols are subjected to a one to one conversion while the numerals are processed in the folloving manncr in thc section 23. As termed herein, the term "numeral" refers to a succession of numerical figures, floating point and comma representing the significance of digits.
numeral word such as "hundred" and consecutive figures appearing in a clause which contains numerical figures. In the present embodiment, a numeral word appearing in a numeral is converted into kanji, a comma to punctuation mark i)), a floating point into a demarcation mark (.). A numerical figure is converted into a Japanese figure whenever it accompanies a numeral word, and is converted into either Japanese figure or Arabian figure depending on the kind of a word such as "year", "pieces" etc. which follows the figure if it does not contain a numeral word.
Referring to Fig. 13, the numeral and symbol processing section 23 comprises an input register 131, a numeral conversion circuit 132, a numeral identification circuit 133, a symbol conversion circuit 134, a symbol identification circuit 135 and an output register 136 Describing the operation of the processing section 23 with reference to the input sentence shown in Fig. 2a, the clause "16HI" appearing in the fifth line thereof is already processed in the splitting/combining section 21, and hence the split clauses "16 HI" are fed from the section 22 to the input register 131, and thence to the numeral conversion circuit 132 and the numeral identification circuit 133. The identification circuit 133 identifies the numeral, supplying an instruction to the conversion circuit 132 to convert the clause "16" into "[16]" and "HI" into "[day]". The conversion circuit 112 provides a converted output "KX(l6 dayl" of Japanese code, which is passed to the output register 136 without being subjected to any processing in the symbol conversion circuit 134.
When the clauses "519 MAN 4950" appearing in the seventh line of the sentence are successively supplied to the input register 131, the clause "519" is initially fed to the circuits 132 and 133. The identification circuit 133 supplies an instruction to the conversion circuit 132 to convert it into a Japanese figure, thus converting the clause "519" into "KX[519]" of Japanese code. Both circuits 132 and 133 temporarily store the figure clause until the next clause is supplied.
Subsequently when the clause "MAN" is supplied from the input register 131 to the circuits 132 and 133, since the identification circuit 133 already noted that the preceding clause represented a figure, it supplies an instruction to the conversion circuit 132 to convert the clause "MAN" into "KXiten thousands]". Subsequent to the conversion, the clauses "KX[519]" and "KX[ten thousands]" which have been temporarily stored in the conversion circuit 132 are fed through the circuit 134 to the output register 136. When the signal indicating the clause "4950" is fed from the input register 131 to the circuits 132 and 133, the identification circuit 133 identifies these numerical figures, supplying an instruction to the conversion circuit 132 to convert "4950" into "KX[49501", which is temporarily stored in the circuits 132 and 133. Subsequently when the clause "NIN" is fed from the input register 131 to the circuits 132 and 133, the irrelevance of "NIN" with a numeral causes the preceding clause "KX[4950]" to be fed from the conversion circuit 132 to the output register 136, followed by passing the clause "NIN" thereto without being processed in any manner. In this manner, the numeral conversion circuit 132 and the numeral identification circuit 133 cooperates with each other to convert an Arabian figure appearing in the input sentence into a corresponding Japanese figure. However, in the present example, the Arabian figures indicative of the message number and the message date which appear in the portion of the sentence beginning with the beginning of message cipher to the priority cipher as well as those Arabian figures which are added with "VX" and "NX" are passed without conversion.
The conversion of symbols in the processing section 23 will be described below with reference to the input sentence shown in Fig. 2a. When the clause ":SITUGYOO:" appearing in the fourth line of the input sentence is supplied to the circuit of Fig. 13, the initial colon ":" is fed through the input register 131 to the circuits 132 and 133. The colon is not processed in any manner in these circuits, and is passed to the following symbol conversion circuit 134 and the symbol identification circuit 135 where the symbol ":" is identified by the circuit 135, which supplies an instruction to the symbol conversion circuit 134 to convert it into a Japanese code "KXI(H)1" in accordance with the scheme shown in Fig. 26 to be fed to the output register 136. The clause "SITUGYOO" is passed to the output register 136 with no processing. The last symbol ":" is similarly converted into "KX[G)I" in the circuit 134 to be fed to the output register 136.
(5) Hiragana conversion section (Fig. 14) This section converts any unprocessed clause represented in Roman alphabet characters of the input sentence (which are encoded in the pseudo international code using six bits per character) into a hiragana according to the kana code employing six bits per character. It will be recalled that under the paragraph of "Definition", a hiragana is to be denoted according to the hiragana-denotation character conversion table T2 for convenience. It is emphasized that such denotation characters are used for the purpose of convenience only, and that what is actually represented is a hiragana corresponding to the denotation characters.
For converting a clause represented in Roman characters into hiragana's the present invention adopts the following rules: a. An assimilated sound will be represented by the duplication of a next following consonant. (The understoring in the following examples are added for the purpose of emphasis).
Example HAPPYOOSITA At (H1T0P2Y5A3S2T1) NETTCHUU - (N4ToT2Y3A3) b. A syllabic nasal is represented by the insertion of " ' " (apostrophe) or following "N" whenever a succeeding letter is a vowel or Example ZEN'EI < (Z4NoA4A2) ZENQEI B (") c. Whenever a vowel "0" appears in succession, the second vowel is denoted by "A3,,.
Example DENTOOO ~ (D4NoT > A3A4) d. When it is desired not to convert vowels "0" appearing in succession into a long vowel, that vowel is preceded by " ' " (apostrophe) or "Q".
Example KO'ORI or KOQORI < (KsA5R2) e. A particle "WO", "YE" or "WA" is converted into "(W5)", "(H4)" or "(H)", respectively.
Example IYINKAIWA WA (A2A2N0K1 A2H1) HOOKOKUSHOWO (HsA3KsK4S2YoWs) GORUFUYE - (G5R3H3H4) f. If the above rules a. to e. are applicable, the conversion is effected in accordance with the Roman character-to-hiragana conversion table given in Table 1. This table includes the Roman spellings for both the Hepburn system and Japanese rendering.
Fig. 14 shows a block diagram of a kana conversion section 24. This conversion section 24 comprises an input register 141, an KX, VX, LX and NX determination circuit (hereinafter referred to as four X determination circuit) 142, a hiragana conversion circuit 143, and an assimilated sound, syllabic nasal, consecutive "0" identification circuit (hereinafter referred to as three specific sound identification circuit) 144 and an output register 145. A clause to which any one of "KX", "VX", "LX" or "NX" is added is directly passed through the four X determination circuit 142 to the output register 145.
Describing the operation of the conversion section 24 with reference to the input sentence shown in Fig. 3a, when the clauses ":ZEN'EI OPEN GOLF:" appearing in the fourth line is supplied, the initial symbol ":" is directly fed to the output register 145 since it is inputted in the form "KX[G)]" as a result of the conversion in the preceding circuit and the portion "KX" is detected by the four X determination circuit 142. When the next clause 'ZEN'EI" is fed from the input register 141, the four X determination circuit 142 detects the absence of any one of the four X's whereby it is fed to the hiragana conversion circuit 143 and the specified three sound identification circuit 144. The circuit 144 identifies that it represents an assimilated sound, supplying an instruction to the conversion circuit 143 to convert it into a hiragana of the kana code "(Z4NoA4A2)" in accordance with Table I for output to the output register 145. When the next clause "OPEN" is inputted to the present system, the determination circuit 103 and the storage circuit 104 of the input stage 10 have added "LX" thereto. Similarly, "LX" is added to the next succeeding clause "GOLF". As a consequence, each of these clauses are fed to the output register 145 without change since the functional word "LX" is detected by the four X determination circuit 142. The last symbol ":", which is already converted into "KXI]" in the previous circuit, is passed to the output register 145 without being processed in any manner in the conversion section 24. When the individual clauses appearing in the fifth line and sequential lines are supplied, those clauses added with one of the four X's, namely, "KX", "LX", "VX" and "NX", are directly fed to the output register 145 while clauses without one of the four X's is converted into hiragana's of the kana code before they are fed to the output register 145.
The conversion from the Roman characters to hiragana's which takes place in the hiragana conversion circuit 143 is achieved by partitioning an input clause fed in the form of Roman characters between each vowel, exclusive of an apostrophe, and a following character, either vowel or consonant, to form a segment, and retrieving a hiragana which corresponds to each segment from a memory which stores the conversion table shown in Table 1. Describing more specifically with reference to the example of "ZEN'EI", when the signal "Z" is inputted, it is retained, because of its being a consonant, until the next vowel "E" is supplied, thereby forming a segment "ZE". A corresponding hiragana "(Z4)" is retrieved from the conversion table. When "N" is then inputted, the next signal must be waited since it represents a consonant. However, because the next signal is " (apostrophe), one segment is formed by "N, ", retrieving a corresponding character "(N,)". The following signals "E" and "I" represent vowels, so that they are treated as independent segments, permitting corresponding characters "(A4)" and "(A2)" to be retrieved. Clauses written in the Roman characters is converted into hiragana clauses by such manner of conversion. However, there are exceptions which are performed by the hiragana conversion circuit 143 in response to an instruction from the three sound identification circuit 144. Specifically, when "0" appears in succession, the second appearing "0" is converted into "(A3)" rather than "(As)".
Also, for an assimilated sound comprising two consecutive consonants, the initial consonant is converted into "(T,)".
When the clause "DENTO0O" appearing in the fifth line of the input sentence shown in Fig. 3a is supplied, it is fed to the circuits 143 and 144.
Thereupon the identification circuit 144 identifies consecutive "0" sounds, supplying an instruction to the hiragana conversion circuit 143 to convert that clause into a corresponding clause "(D4NoT5A3As)" to be fed to the output register 145. When a clause "NETCHUU" appearing in the eighth line is supplied to the circuits 143 and 144, the specific three sound identification circuit 144 detects the presence of an assimilated sound, supplying an instruction to the hiragana conversion circuit 143 to convert it into "(N4T0T2Y3A3)" to be fed to the output register 145.
4. Summary of kana-to-kanji conversion stage The principal function of the conversion stage 30 is to convert clauses which are obtained at the output of the preceding stage 20 in the form of hiragana of the kana code into clauses of kanji and/or hiragana of the Japanese code. In addition, the conversion stage 30 is effective to convert a word which is converted by the input stage 10 into Roman characters of the kana code and having the functional word "LX" into a katakana word of the Japanese code, eliminating the functional word. In addition, the stage 30 is also effective to convert a word which is converted by the input stage 10 into Roman characters of the kana code or into Arabian figures and having the functional word "VX" or "LX", into a corresponding word of the Japanese code, eliminating the associated functional word. Finally, the stage 30 is effective to eliminate the functional word "KX" from a word which is converted into letters or symbols of the Japanese code by the input stage 10 and having such sunctional word.
Referring to Fig. 15, the kana-to-kanji conversion stage 30 comprises a kana clause input section 31, a clause splitting section 32, a vocabulary memory 33, an independent word/adjunctive catenation examining section 34, a grammatical information memory 35, an independent word/non-adjunctive conjugation examining section 36, a word construction determination section 37, a concept information examining section 38, a two word determination section 39 and a kanji conversion section 310.
When clauses which have been converted into hiragana's by the preceding kana conversion section 24 is supplied to the kana clause input section 31, the latter temporarily stores the clauses, and feed them, clause by clause, to the clause splitting section 32 successively whenever the processing of the preceding clause is completed. The clause splitting section 132 supplies the same clause as the hiragana clause supplied thereto to the vocabulary memory 33 where a scan of the memory is made to determine if data can be retrieved therefrom which corresponds to the clause in its original form. When it is determined that a retrieval cannot be made using the clause of the original form, the last one character is removed to dry a rescan. Such procedure is repeated until a retrieval is made possible. The vocabulary memory 33 stores a plurality of words each together with an index word with which a scan and a retrieval are made, and with its associated additional information, as tabulated in Table 6. Specifically, each index word is associated with information representing the part of speech, variety, classification, meaning and concept as well as the Japanese. code of the word retrieved by the index word. Hereinafter, a set of such information will be referred to as vocabulary information. Vocabulary information exclusive of the Japanese code is stored in the memory according to the kana code in which six bits per character is used. The part of speech, variety and classification of the vocabulary information are stored in predetermined positions of a given data track in encoded form while the meaning, concept and the Japanese code are encoded in the hiragana form at the corresponding positions of the data track. It will be noted from the foregoing description that a vocabulary information may be retrieved from the memory 33 for a given clause, or may be retrieved therefrom for a fraction of the clause formed by removing one or more characters from the end of it. Any retrieved vocabulary information is fed back to the clause splitting section 32, and whenever it has been successfully retrieved without removal of any character from the original clause, no splitting of the clause is made. However, when the removal of one or more characters from the end of the clause is necessary to achieve a retrieval, the clause splitting section 32 is effective to split the original clause into an initial portion which has been used to achieve the retrieval successfully and a remaining clause portion. The retrieved vocabularly information and the remaining clause section for which the retrieval failed are fed to the independent word/adjunctive catenation examining section 34.
Referring to Fig. 4a for a more specific description of such manner of operation, when a hiragana clause "(TsD2YsA3K5K3Ns)" corresponding to a Roman character clause "TOJOOKOKUNO" appearing in the fifth line is fed from the kana clause input section 31 to the clause splitting section 32, a scan of the vocabulary memory 33 is made for a vocabulary information having an index word which corresponds to the clause "(T5D Y A3K5K3N5)" in its longest form. If the scan fails to find a corresponding vocabulary information, one character is successively removed from the end of the clause. Assume that "(R D2Y5A3)" is successively retrieved in this manner. This permits a retrieval of vocabulary information having an index word corresponding to that of "(TsD2YsA3)".
Accordingly, the clause splitting section 32 splits the original clause into two portions "(T5D2Y5A3); and the remaining portion which corresponds to "(K > K3Ns)" and supplies them to the following independent word/adjunctive catenation examining section 32. At this time, the clause "(T5D2Y5A3)" is followed by additional formation associated with the retrieved word.
The examining section 34 checks the catenation capability of an independent word and an adjunctive to see if they are catenable. At this end, the grammatical information memory 35 stores a plurality of index words corresponding to adjunctives together with their associated information as tabulated under Table 7 in the same manner as in the vocabulary memory 33. A combination of such index word and associated information will be referred to hereafter as "adjunctive information". In addition to the adjunctive information, the grammatical information memory 35 stores part of speech information using various parts of speech as index words, as illustrated under Table 5. It is to be understood that the part of speech information stored in the memory 35 indicate those parts of speech and varieties of words which can be catenated with a particular word the part of speech of which is used as an index word for retrieval from the memory 35. This table shows a group of adjunctives which are catenated with each part of the speech, as illustrated by a proper noun N which may be catenated with groups M and N of adjunctives. It also indicates the variety of word which can be catenated with each part of the speech or word, as exemplified by a noun N in the prefix which can be catenated with an independent word (pre-word) or a noun representing an independent word which is indicated by a circle. The grammatical information is stored or registered by using a code for the part of speech as index word, encoding adjunctives which are catenable therewith, and entering either "1" indicative of "yes" or "0" indicative of "no" at the predetermined positions for the respective parts of speech.
Returning to the description of the operation, of the vocabulary information and any split remaining clause portion which have been fed fro corresponds to the remaining clause in its longest form or in a reduced form which is obtained by removing one or more characters from the end of the remaining clause. Any retrieved adjunctive information is returned to the examining section 34 where the grammatical catenability between the vocabulary information from the clause splitting section 32 and the retrieved adjunctive information from the grammatical information memory 35 is checked on the basis of the part of speech information contained in both of these informations. If the catenation is found possible, they are fed to the independent word/non-adjunctive catenation examining section 36. However, if the grammatical information memory 35 does not store an adjunctive information having an index word corresponding to the remaining clause section which is supplied thereto, the remaining clause section is returned to the examining section 34 and thence to the clause splitting section 32 to effect a scan of any corresponding word which is stored in the vocabulary memory 33 in the similar manner as mentioned above.
Describing such operation more specifically in terms of example used above, the clause "(K5K3N5)" which is obtained by the splitting process in the clause splitting section 32 is fed through the examining section 34 to the grammatical information memory 35 where a scan is made to see if a word is stored therein which corresponds to the original form "(K5K3N5)". If the scan fails to find out a corresponding word, one character is successively removed from the end of this clause. However, since none of clauses "(K5K3N5)", "(K5K3)" and "(K5)" is an adjunctive, the memory 35 does not store any adjunctive information having an index word which corresponds to one of these clauses. Therefore, the signal is returned to the independent word/adjunctive catenation examining section 34 where it is determined that clause "(T5D2Y5A3)" cannot be catenated with the returned clause "(K5K3N5)". Thereupon the clause "(K5K3N5)" is returned to the clause splitting section 32 for a scan of a corresponding word in the vocabulary memory 33 for the second time. A word "(K5K3)" corresponding to the clause "(K6K3N5)" from which the last character "(N5)" is removed is retrievable from the memory 33, so that the clause splitting section 32 splits the original word into two clauses "(K5K3)" and "(N5)", and feed them to the independent word/adjunctive catenation examining section 34 together with the vocabulary information for "(KsK3)". Subsequently, a search is made to see if adjunctive information corresponding to the clause "(N5)" can be retrieved from the grammatical information memory 35. When it is retrieved and fed to the independent word/adjunctive catenation examining section 34, the latter finds that the clause "(K5K3)" stands as a postfix and is catenable with the clause "(N5)" which represents an adjunctive. When the catenability is thus determined, the clauses "(T5D2Y5A3)" and "(K5K3N5)" are fed to the examining section 36 to see if they can be grammatically conjugated together.
The examining section 36 checks if the word or clause which has been fed from the independent word/adjunctive catenation examining section 34 can be catenated or conjugated with the preceding word or clause. The check is achieved by comparing the part of speech information of the newly fed word or clause against that of the part of speech information of the preceding word or clause in accordance with the stored data in the grammatical information memory 35. When it is determined that the both words or clauses can be conjugated together, the preceding clause is fed to the following word construction determination section 37.
Referring to the previous example for more specifically describing this operation, the part of speech information is retrieved from the grammatical information memory 35 for the vocabulary information of "(T5D2Y5A3)" and the adjunctive information of "(K5K3N5)" to see if they can be grammatically catenated together. In this instance, these clauses represent an independent word and a postfix. and are therefore can be conjugated together. In this manner, it is determined that the catenation and the conjugation apply for the clause "(T5D2Y5A3K5K3N5)", which is temporarily stored in the independent word/nonadjunctive conjugation examining section 36 for use in the examination of conjugation with the following clause. It is to be understood the combining capability of an independent word with an adjunctive is referred to herein as "catenation capability" while that of an independent word with a non-adjunctive word is referred to as "conjugation capability", with the undering that the term "conjugation" here used is to be distinguished from the conjugation of a verb.
* The next following clause in the fifth line of the input sentence shown in Fig.
4a is "IPPOOTEKINA", and a corresponding word "(A2ToP5AaT4K2N1)" is supplied to the clause splitting section 32. In the similar manner as mentioned above, a character is successively removed from the end Qf the clause "(A2T0P5A3T4K2N1)" for the purpose of scanning the vocabulary memory 33.
Assuming that a word "(A2ToPsA3T4K2)" is found retrievable from the vocabulary memory, a vocabulary information having an index word which corresponds thereto is supplied from the section 32 to the examining section 34 together with the clause "(N1)". The section 34 examines if the clause "(A T0P5A3T4K2)' can be catenated with the clause "(N1)", which is found possible in this instance.
Therefore, the clause "(A2ToPsA3T4K2N,)" is supplied to the examining section 36 where the conjugation capability of the new phase "(A,T PsA3T4K2)" with the clause "(KsK3N )" which has been temporarily stored is checked by retrieving from the grammatical information memory 35, using the part of speech information of the adjunctive information for "(us)" and of the vocabulary information for "(A2T0P5A3T4K2)" as index words, those parts of speech listed in the memory 35 for each of the index words as catenable with the respective part of speech used as the index word. In this instance, a word construction comprising an adjunctive followed by an independent word is not permitted, resulting in a determination that the conjugation is impossible. Since the clauses "(... K,K,N,)" and "(A2ToPsA3T4EÇ3...)" are different, the clause "(T5D2Y5A3K5K3N5)" is fed to the word construction determination section 37, and "(KsA3DsA3G,)" corresponding to "KOODOGA" is supplied to the clause splitting section 32.
To illustrate a slightly different manner of operation, the processing of a clause which has been once fed to the independent word/non-adjunctive conjugation examining section 36, by returning it to the clause splitting section 32 again, will be described. When "(M1T2G1A2N1A2)" corresponding to the clause "MACHIGAINAI" appearing in the eighteenth line of the sentence shown in Fig.
2a is supplied to the clause splitting section 32, a scan is made in the vocabulary memory 33 for an index word which initially corresponds to this clause in its original form and then to a sequentially reduced form which is obtained by removing one character successively from the end of the clause. Assume that "(M1T2G1A2)" (an independent word, a noun not having an affix) is retrieved, and that the clause splitting section 32 has divided the original clause into two clauses "(M T2G,A2)" and "(N,A2)", which are supplied to the independent word/adjunctive catenation examining section 34 together with the vocabulary information for "(M1T2G1A2)". This results in scanning the grammatical information memory 35 for one of adjunctives which are connectable with a noun and which corresponds to "(N,A2)". However, the scan fails, so that it is determined that the catenation is impossible. For this reason, the clause "(N,A2)" is returned to the clause splitting section 32 again, and the vocabulary memory 33 is scanned again for "(N1A2)". Assuming that "(N,A2)" having vocabulary information including the Japanese code for a homonym [NAI] is retrieved, there is no need to examine the catenability since "(N,A2)" is not splitted. Consequently, it is passed through the catenation examining section 34 to be fed to the independent word/non-adjunctive conjugation examining section 36 where the conjugation capability between the preceding clause "(M1T2G1A2)" and "(nix2)" is examined by retrieving those parts of speech which are stqred in the grammatical information memory 35 as catenable with each of the parts of speech corresponding to the respective clauses and which are used as index words during the retrieval. In this instance, the conjugation is inapplicable since the combination represents an independent word (not having an affix)+postfix. However, the clauses "(M1T2G1A2)" and "(N1A2)" are not fed to the word construction determination section 37 since they are located in a single clause, but they are returned to the clause splitting section 32 again in the original form of the clause "(M1T2G1A2N1A2)".
In the preceding instance, a scan of the vocabulary memory 33 has been made for the clause "(M1T2G1A2)", but now a scan is made for a reduced form "(M,T2GI)". In this instance, "(M,T2G,)" is retrieved, and the clause splitting section 32 splits the original clause into "(M1T2G1)" and "(A2N1A2)". The clause "(A2N1A2)" is fed to the catenation examining section 34 together with the added information for "(M1T2G1)". Thereupon, a scan is made in the grammatical information memory 35, starting with the original form "(A2N1A2)" and removing one character successively from the end of the clause. In this manner, a successful retrieval is achieved for "(A2N1)". The grammatical catenability is examined by retrieving the part of speech classification stored in the grammatical information memory 35 for "(M1TG1)" and "(A2N,)", resulting in the determination that a catenation between "(M,T,G,)" (w-series verb having five conlugations) and "(A2N1)" (an inflection associated with a w-series verb having five conjugations). As a result of the catenation, an inflection for the w-series verb is obtained.
Then "(A2)" is retrieved from the grammatical information memory 35, and since it represents an adjunctive, the grammatical catenation capability is examined by retrieving the part of speech classification for "(A2N1)" and "(A2)".
Thereupon, a determination is made that the catenation is possible between "(A2N1)" (an inflection associated wth w-series verb having five conjugations) and "(A2)" (an adjunctive attached to an inflection of the w-series verb). As. a consequence, the clause "(M1T2G1A2N1A2)" is fed to the examining section 36 while the clause which follows it, namely"." (period), is directly passed through the clause splitting section 32 and the examining section 34, and further passed without examination in the examining section 36 and is unconditionally fed to the word construction determination section 37 since it is already converted into an output character in the numeral/symbol processing section 23.
A further different manner of operation will be described with reference to a clause "(W1A2T4K2M1S2T1)" which corresponds to "WAITEKIMASHITA" appearing in the eighth line of the input sentence shown in Fig. 3a. When this clause is supplied to the clause splitting section 32, a scan is made in the vocabulary memory 33, beginning with the original form "(W1A2T4K2M1S2T1)" and sequentially removing one character from the end of the clause. Assume that the retrieval failed because a corresponding word is not stored in the memory. Then the input clause is not splitted in the section 32, but is fed to the examining section 34 and thence to the examining section 36. The conjugation capability with the preceding clause cannot be examined in the examining section 36 as a result of its being unretrievable word, with the preceding clause being fed to the word construction determination section 37. Since "." (period) follows the clause "(W,A2T,K2M,S2T1)", this clause is also fed to the word construction determination section 37.
The word construction determination section 37 determines if the word construction of the clause fed thereto is grammatically proper in accordance with the additional information of the individual words. The determination is made on the basis of typical word constructions shown in the table of the paragraph 7 of "Denotation of Japanese Language", determining if a coincidence with a particular combination in the table is achieved. In this manner the word construction within a clause is examined, and if the clause does not contain an independent word, the clauses which immediately precede and follow it are examined.
The concept information contained in either vocabulary or adjunctive information of each clause is examined in the concept information examining section 38. The examination of the concept information is principally utilized to distinguish homonyms, selecting a proper word from several homonyms. The section of a proper homonym will be described later in further detail. The concept information examining section 38 operates on the basis of a phrase, and any examined phrase is fed, clause by clause, to the subsequent two word determination section 39. A word for which the various sections up to the concept information examining section 38 failed to decide a signal homonym will be outputted by the determination section 39 by adding parentheses before and after the two word as they are derived in a kanji and kana compounded sentence, with (period) inserted between the two words. In this instance, when more than two homonyms are fed to the two word determination section 39, the initial two words will be outputted while the third and any subsequent word will be deleted. The two word determination section 39 successively feeds clauses to the kanji conversion section 310.
The kanji conversion section 310 is operative to convert a hiragana of the kana code into a kanji and/or hiragana word of the Japanese code by leaving the Japanese code contained in its associated vocabulary or adjunctive information and deleting all other informations. The conversion section 310 also comprises means for converting a word having the functional word "LX" into a hiragana of the Japanese code and eliminating the functional code, means for converting a word having the functional word "VX" or "NX" into corresponding characters of the Japanese code and eliminating such functional word, and means for eliminating any functional word "KX" which is attached to a word of the Japanese code.
5. Selection of homonyms (I) Homonyms As mentioned under the paragraph 6 of "Denotation of Japanese Language", there are an increased number of homonyms having the pattern: A plurality of character trains plurality of concepts)-a single phonetic series Such homonyms may comprise independent words as well as a combination of an independent word and an adjunctive. Consequently, in the conversion into a kanji and kana compounded sentence of a Roman sentence of Japanese language which represents a phonetical representation in Roman characters of a Japanese language, it is of great importance to select a correct one from a plurlaity of homonyms. In the system of the invention, the following nine processing steps are taken principally in the kana-to-kanji conversion stage for the proper selection of homonyms.
(2) Selection according to the content of an input sentence As mentioned previously, vocabulary information stored in the vocabulary memory 33 includes "classification data" indicative of the field in which a particular word is principally used. In accordance with the invention, a separate memory is used for storing vocabulary information according to such classifications. The classification may be made in the grouping of "fundamentals", "sports", "economy", "science" or the like. In accordance with the invention, the classification of the content of an input sentence is specified upon formulating the input sentence, so that an inquiry can be made about the input sentence to a memory having vocabulary information of a particular classification. As a result of this, a retrieval of a homonym belonging to a different classification can be prevented. In the embodiment shown in the drawings, the address cipher 2a in the input sentence I is used to specify the classification, but it should be understood that a separate cipher may be used for this purpsoe.
Referring to the input sentence I illustrated in Fig. 4a for a more specific description, the address cipher 2a of this input sentence I is "GAl", which specifies the classification of fundamental words. When a hiragana input "(TsA3S2Y3)" corresponding to the initial clause of "TOOSHU KAIDANWO" (the meeting of the party heads) appearing in the eighth line is supplied from the clause splitting section 32 to the vocabulary memory 33, the latter may store two vocabulary informations having the same index word corresponding to this clause but which represent "party head" under the classification of the fundamental words and "pitcher" under the classification of sports terms, both in the Japanese code. The input is compared against a particular memory which stores fundamental words.
thereby retrieving vocabulary information in the Japanese code having the index word of "(T > A3S2Y3)" and corresponding to "party head" which is stored in this memory. In this manner, a correct selection is made from two homonyms "party head" and "pitcher".
(3) Selection according to meaning information of words in the input sentence In the system of the invention, vocabulary information stored in the vocabulary memory 33 may contain meaning information of a word. Thus, by storing meaning information for a word which has a homonym associated therewith, a word for which the existence of a homonym is expected may be added with meaning information in parentheses when formulating an input sentence. This permits the retrieval of vocabularly information having the meaning information which coincides with the meaning information added during the formulation.
Referring to the phrase "GENSYOO ((HERU))" (meaning a "reduction' or "reduce") appearing in the fifteenth line of the input sentence I shown in Fig. 2a.
when the hiragana clause "(G4NoS2YsA3 ((H4R)))" corresponding thereto is supplied from the clause splitting section 32 to the vocabulary memory 33, a search is made if there is vocabulary information stored therein which has the same index word. Assuming that the memory 33 stores two vocabulary informations having the same index word but having different meaning information as illustrated in Table 8.
it is possible to retrieve vocabulary information having meaning information which coincides with the meaning information contained in the input sentence.
(4) Selection according to composite words stored In the system of the invention, the vocabulary memory 33 stores an index word in a form separated into a pre-index and a post-index for those words which can be used as a composite word in combination with another word, when storing vocabulary information. By way of example, when a composite word "KOKUSAI KIKAN" (international organization) as illustrated in Table 9 is inputted, a correct word can be selected from among homonyms. Specific examples will be described below.
Assume that there are five composite words which can be retrieved from the vocabulary memory 33 for a pre-fix "(KsK3S,A2)" (international) when it is supplied to the clause splitting section 32. The words having the pre-fix also includes ones which also has a post-index. When a word which may form a composite word is involved, the vocabulary information for "(KsK3S,A2)" is temporarily stored in the vocabulary memory 33 rather than being fed to the clause splitting section 32 while providing an instruction to the kana clause input section 31 to feed the next clause into the clause splitting section 32 and into the vocabulary memory 33. In the present example, the next clause is "(K2KlNo)" (organization). Of the five homonyms for the index word "(KsK3S,A2)" which are temporarily stored, there is achieved a coincidence with an index word for vocabulary information of the Japanese code [international organization] which has the post-index "(K2KrNo)" In this manner, this vocabulary information is supplied to the clause splitting section 32. It is to be noted that other homonyms for the index word "(K6K3S,A2)" are deleted when the word "(KsK3SlA2 K2KrNo)" (international organization) is supplied to the clause splitting section 32. Obviously, it is also possible to retrieve "(KsK3SlA2K2K,No)" as a single clause, by storing it as vocabulary information in the vocabulary memory 33.
(5) Selection according to the preferential conjugation of an adjunctive When formulating an input sentence to the system of the invention, an adjunctive is fundamentally formed as a single clause together with an independent word, immediately following the latter. However, in practice, an adjunctive may complete a clause inadvertently. By way of example, the clause "SEIKOO SITA" (made a success) appearing in the eleventh line of the input sentence shown in Fig.
Ia must be written in one clause, but in actuality, it comprises an independent word and an adjunctive separately. When converting such clauses represented in Roman characters into a kanji and kana compounded sentence, a homonym [under] which corresponds to "SITA" may be retrieved, resulting in lunder a success].
To avoid such difficulty, vocabulary memory 33 also stores adjunctives to avoid any cause of an erroneous conversion, rendering an adjunctive also as an object of the conjugation examination in the section 36 in the same manner as an independent word Is made an object of such examination. In the example given above, an independent word "(S2T,)" and an adjunctive "(S2T,)" corresponding to "SITA" are stored in the vocabulary memory 33, and when the hiragana clause "(S4A2K5A3)" corresponding to "SEIKOO" is fed to the clause splitting section 32, "(S,A2KsA3)" is retrieved from the vocabulary memory 33, but is fed through the independent word/adjunctive catenation examining section 34 to the independent word/non-adjunctive conjugation examining section 36 together with additional information because the clause is constituted by one word. Assuming that when "(S2T1)" is subsequently fed to the clause splitting section 32, two independent words "(S2T1)" and an adjunctive "(S2T1)" are retrieved from the vocabulary memory 33, since each of them constitutes a single word with a single clause, they are fed through the examining section 34 to the examining section 36 together with the additional information. A conjugation of "(S4A2KsA3)" and "(S2T,)" is examined in the examining section 36. On the basis of the variety and part of speech information of "(S4A2KsA,)" (an independent word, the stem of s-series inflected verb) and "(S2T,)" (adjunctive, the inflection of s-series inflected verb), they are determined to be catenable, and accordingly the Japanese code for "(S2T,)" is selected preferentially over other independent words "[under]" and "[tongue]" which are homonyms. As a similar example, when the clauses "TUZUKETEIRUNONI TAISI" ("while it continues") appearing in the twelfth line of the sentence shown in Fig. 2a, a homonym [ambassador]" may be retrieved for "TAISI" if the described selecting function is not provided. In the present system, the memory 33 stores an adjunctive "[while]" together with an independent word "[ambassador]" having an index word "(T,A2S2)" which corresponds to "TAISI", thus permitting a correct conversion into a kanji and kana compounded sentence.
(6) Selection according to preferential outputting of word This represents a selection scheme which is broadly similar to the selection according to the relationship with the adjunctive mentioned in the immediately preceding paragraph. According to this selection scheme, a special character, for example, "0" is contained in the part of speech of the vocabulary information for one of homonyms, so that a word having "0" is preferentially selected when there are a number of homonyms.
Describing this more specifically with reference to "(A1R3 T5S2)" corresponding to "ARU TOSI" appearing in the seventh line of the input sentence shown in Fig. 4a, the vocabulary memory 33 stores vocabulary information of the Japanese code for "[year]" and "[city]" for "(T5S2)" as homonyms. When "(T5S2)" is specified with the part of speech "0", the independent word/non-adjunctive conjugation examining section 36 examines the conjugation capability of"(A1R3)" and " T5S3)" based on the variety and part of speech information, permitting a preferential selection of a word specified with the part of speech "0" from among homonyms.
(7) Selection according to concept information In the present system, a correct selection is made from homonyms by comparison of one part against another part of a clause as well as by the comparison of adjacent clauses. In addition, a correct selection of a homonym is assisted by the comparison of phrases. This utilizes the "concept information" of vocabulary information stored in the vocabulary memory 33. This selection will be more specifically described.
Reference is made to the ninth to thirteenth lines of the input sentence shown in Fig. 4a which is recapitulated below.
SONOBADE, SEKIYU DAITAI ENERGY GA KAIHATSU SARETE SEKIYU KAKAKUMO OCHITUKU TAISEIGA DEKIRUMADE, KIKENNA HASITO MIRARERU "KEIZAI ENJYO MONDAI" WO SAKETE WA TARUKO TOGA TOOGI SARETA. PARA It may be possible that homonyms "[bridge]", "[chop stick]" and "[end]" corresponding to the underscored clause "HASITO" may be brought into the concept information examining section 38 as a result of failure of selecting a proper homonym in the system portion from the clause splitting section 32 to the word construction determination section 37. In the present system, the examining section 38 permits "[bridge]" to be selected based on the concept information of "WATARUKOTOGA" (cross) which appear seven clauses after "HASITO".
(8) Selection utilizing the catenability examination based on the part of speech information The examination of the catenability in accordance with the part of speech information which takes place in the independent word/adjunctive catenation examining section 34 has been already described in terms of examples, but can be utilized to permit a proper selection of homonym.
This will be illustrated with reference to the clause "(A2A2N0K1A2H1)" corresponding to "IYINKAIWA" appearing in the fifth line of the sentence shown in Fig. Ia. In of the sentence shown in Fig. 3a, for example, is fed to the clause splitting section 32, "(KsNoG )" is retrieved from the vocabulary memory 33, and this cause is split into "(K5N065)" and "(W,)" in the splitting section 32. Then "(H,)" is retrieved from the grammatical information memory 35 for the adjunctive "(W,)" and is examined as to its catenability with "(KsNoGs)". The catenation is found possible, whereby "(W1)" is replaced by "(H1)". In the similar manner, "(As)" and "(A3)" may be replaced by "(we)" and "(H3)", respectively.
(9) Selection utilizing the examination of conjugation based on the part of speech information The examination of conjugation based on the part of speech information has been described previously in connection with the independent word/nonadjunctive conjugation examining section 36 for several examples, and can be utilized to achieve a selection of a homonym. This selection can be accomplished by utilizing the conjugation of an independent word with a pre-fix or post-fix, the conjugation of an independent word with an adjunctive or the conjugation of an adjunctive with another adjunctive. A specific example to illustrate the selection according to the part of speech information will be given below.
Considering the hiragana clause "(S3Z3K2 S4N02Y3G1)" corresponding to "SUZUKI SENSHUGA" appearing in the sixth line of the sentence shown in Fig.
3a, as the clause "(S3Z3K2)" is fed to the clause splitting section 32, vocabulary information having an index word for "(S3Z3K2)" is retrieved from the vocabulary memory 33. Since it forms a single word with a single clause, it is fed from the splitting section 32 through the examining section 34 to the examining section 36 together with its additional information. Subsequently when the next clause "(S4N0S2Y3G1)" is fed to the clause splitting section 32, a vocabulary information having an index word for "(S4NoS2YJ)" is retrieved from the vocabulary memory 33. Assume that four vocabulary informations for "(S4NoS2Y3)" have been retrieved as illustrated in Table 11. Examining the conjugation capability of the part of speech information for "(S3Z3K2)" and "(S4NoS2Y3)", it is found that "(S4N0S2Y3)" (player) has the highest priority for conjugatiori with "(S3Z3K2)". In this manner, vocabulary information having the Japanese code for "[player]" can be properly selected in the preferential manner over other homonyms in accordance with the part of speech information.
(10) Selection of homonym by "VX", "LX" and "NX" The functional words "VX", "LX" and "NX" are usually entered when formulating an input sentence, but if they are not entered, they can be automatically added within the present system to permit a selection of a homonym.
Referring to "EC" "(G,W,)" corresponding to "EC GAWA" (which means "EC") appearing in the eleventh line of the sentence shown in Fig. 2a to illustrate the use of the functional word "VX", it is desirable that the input clause "EC" be outputted from the present system in the original form "EC" using Roman characters. This may be easily achieved by inputting "ECVX". However, the exemplary sentence shown does not contain "VX". And therefore the functional word "VX" is automatically added to "EC" within the storage determination section 14. When the clause "ECVX" having the added functional word "VX" is fed to the kana clause input section 31, it is passed through the clause splitting section 32 and the examining section 34 to the examining section 36 where it is stored temporarily. Subsequently when the next clause "(G,W,)" is supplied to the clause splitting section 32, four of "(G,W,)" may be retrieved from the vocabulary memory 33 as illustrated in Table 12. Since each of these constitute a word with a single clause, they are fed through the examining section 34 to the examining section 36 together with their additional information. The conjugation examining section 36 selects "(G,W,)" (side) having part of speech information which corresponds to "VX", and rejects the remaining three words for "(G,W,)".
Referring to the clause "FLORIDA SHUUDE" appearing in the seventh line of the sentence shown in Fig. 4a, the use of the functional word "LX" will be described. When the clause "FLORIDA" is inputted, the input converted word storage determination section 14 retrieves "FURORIDALX" therefor, which is fed to the independent word/non-adjunctive conjugation examining section 36.
Assume that five of "(S2W3A3)" are retrieved from the vocabulary memory 33 as illustrated in Table 13 when "(S2W3A3)" corresponding to the next clause "SHUUDE" is fed to the clause splitting section 32. The section 32 splits the clause "(S2W3A3D,)" into "(S2W3A3)" and "(D4)", which are fed to the examining section 34 together with their additional information. The section 34 determines that "(S2W3A3)" is catenable with "(D,)", supplying them to the examining section 36.
The examining section 36 must determine which one of "(S2W,A3D4)" is to be selected for conjugation with ((FLORIDA)). Of five "(S2W3A3)", only that "(S2W3A3)" (state) has part of speech information corresponding to "LX", and that word is selected. In this manner, a proper selection from among five homonyms for "(S2W3A3)" can be made by automatically adding "LX" to the preceding clause.
(II) Selection of homonym impossible When the selection of a homonym cannot be achieved when the described various selecting functions are exercised, the present system puts two words each in parentheses.
A specific example will be given, considering the input sentence shown in Fig.
4a. When "(T,A2S,A2G,)" corresponding to "TAISEIGA" appearing in the tenth line is fed from the clause splitting section 32 to the vocabulary memory 33, the latter may store four kinds ot vocabulary information as illustrated in Table 14. Of these vocabulary informations, "(T,A2S4A2)" (system) belongs to the classification of sports terms and "(T,A2S;A2)" (resistance) belongs to the classification of scientific terms. When such assumptions are made, the remaining two words are retrieved using the above mentioned selection schemes. The clause splitting section 32 splits the clause "(TXA2S,A2G,)" into "(T,A2S4A2)" and "(G,)", and feed them to the independent word/adjunctive catenation examining section 34 together with the additional information for "(TrA2SA2)" They are fed from the section 34 to the examining section 36 and thence through the word construction determination section 37 and the concept information examining section 38. Assuming that the selection of a homonym cannot be achieved by the consideration of the word construction and grammatical connection as extended over the preceding and following clauses, and that no concept information has been added, the two word determination section 39 puts parentheses before and after the respective homonyms with a demarcation mark " between them and supply them to the kanji conversion section 310. Thereupon, the conversion section 310 outputs "[attidudel [system]".
While a failure of selecting a correct homonym has been assumed in this example, it should be understood that a correct selection can be made in this instance by adding concept information to the input sentence in the input stage and storing corresponding concept information in the vocabulary information.
Whenever more than one homonyms are fed to the two word determination section 39, those two word having the highest frequency of use are selected. For this reason, the vocabulary is stored in the vocabulary memory 33 in a manner to permit an indexing in the sequence of frequence of use.
b. Detail of kana-to-kanji conversion stage (1) Kana clause input section, kana splitting section and vocabulary memory (Figs.
16 and 17) Fig. 16 shows the circuit diagram of the kana clause input section 31 and the clause splitting section 32, which comprise a first to fourth input register 151 to 154, kana clause input circuit 155, KX determination circuit 156, word splitting circuit 157, word combining circuit 159 and output register 1510.
As shown in Fig. 17, the vocabulary memory 33 comprises an input register 161, O determination circuit 162, one word register 163, address code determination circuit 164, a composite word determination circuit 165, vocabulary information storage circuit 166 including a plurality of counters 170, a first output register 167 feeding a word splitting circuit 157 of the kana splitting section 32, and a second output register 168 feeding a word combining circuit 159 of the kana splitting section 32. The O determination circuit 162 functions to determine the position at which a clause is split for purpose of scanning vocabulary information when a clause is re-entered from the examining section 36. The address code determination circuit 164 determines the address code of each message. The composite word determination circuit 165 temporarily stores vocabulary information to determine whether or not it is adequate to form a composite word.
Considering the vocabulary information storage circuit 166, it stores a multitude of words and composite words in the form of vocabulary information, each comprising a set of an index word with its associated additional information, which are stored in given data tracks 169. A vocabulary information comprises an index word, and additional information which include the part of speech, variety, classification, meaning, concept and the Japanese code of kanji. The vocabulary is stored in independent groups representing the field in which the corresponding words find their principal application, such as "fundamental words", "economical words", "sports terms", 'scientific terms" and the like. The number of vocabulary information will amount to ten or tens of thousands since a word is usually constituted by logograph or kanji.
A specific example will be described in detail for the clause "MACHIGAlNAI" appearing in the input sentence shown in Fig. 2a. When the hiragana clause "(M1T2G1A2N1A2)" corresponding to this input clause is fed from the first input register 151 to the KX determination circuit 156 through the circuit 155, the absence of "KX'' added to this clause causes it to be fed to the word splitting circuit 57 and the vocabulary memory 33 through the second input register 152. When the clause "(M1T2G1A2N1A2)" is fed from the input register 161 of the vocabulary memory 33 (Fig. 17) through O determination circuit 162 to the one word register 163, it is further fed thence to a counter and index comparison circuit 170 which will be hereinafter referred to in abbreviated form as "counter". It checks vocabulary information, one by one, from the storage circuit 166 for searching vocabulary information having an index word which coincides with the clause "(M1T2G1A2N1A2)" supplied thereto while incrementing the counter. It is to be understood that a plurality of vocabulary information memories 169 are associated with a counter 170 according to the classification as mentioned above, with the address code of each message specifying the classification. Assuming that a message having the address code "GAI" specifies the classification of fundamental words, a command from the address code determination circuit 164 enables only that vocabulary information memory 169 and counter 170 which store the fundamental words, disabling vocabulary information memories and counters of the other classifications. Continuing with the description, when the coindence is not attained for the input clause "(M,T2GrA2NIA2)" in its original form, one character is removed from the end to use "(M1T2G1A2N1)" to see if the coincidence is reached therewith, while incrementing the counter 170. This procedure is repeated while sequentially removing one character from the end of the preceding clause. Assuming that the coincidence is reached for "(M1T2G1A2)", the counter 170 is reset, and the vocabulary information found is fed to the one word register 163, which passes it to the word splitting circuit 157 through the first output register 167. The retrieved vocabulary information for "(M,T2G,A2)" is also fed from the one word register 163 to the word combining circuit 159 through the composite word determination circuit 165 and the second output register 168.
As the clause "(M,T2G,A2)" is fed from the vocabulary memory 33 to the word splitting circuit 157, it separates the word from the remaining portions. Thus, the portion "(N1A2)" is separated from "(M1T2G1A2N1A2)" and is fed to the word combining circuit 159, which combines it with the vocabulary information for "(M,T2G,A2)" which has been separately fed from the vocabulary memory 33 supplying the combination to the output register 1510 (Fig. 16). The third and fourth input register 153, 154 are used for returning the signal which has been once fed from the output register 1510 to the succeeding circuit, in order to effect a processing of the signal again in the word splitting circuit 157 and the vocabulary memory 33.
The function of selecting a homonym in the vocabulary memory 33 will now be described in terms of specific examples. The selection according to the specified classification will be described with reference to a hiragana clause "(TsA3S2Y3 K,A2D,NoAs)" corresponding to "TOOSHU KAIDANWO" (meeting of party heads) which appear in the input sentence shown in Fig. 4a. Assume that the clause "(TsA3S2Y3)" is passed through the first input register 151, the kana clause input circuit 155, KX determination circuit 156 and the second input register 152 to the word splitting circuit 157 and vocabulary memory 33 shown in Fig. 16. The vocabulary information storage circuit 166 of the vocabulary memory 33 stores a multitude of vocabulary information in groups according to the preassigned classifications. In the present system, the address code determination circuit 164 responds to "KX Iforeign newsl" corresponding to the address code "GAl", for example, by determining that the classification of "fundamental word" is specified by this address, enabling the counter 170 which belongs to this classification for operation.
Referring to Fig. 17, assuming that a group of counters desifnatcd A belongs to the "fundamental words", the vocabulary memory 169 which is operatively connected with the counter A operates alone while the vocabulary memories associated with other counters do not operate. When "(T5A3S2Y,)" is passed through the O determination circuit 162 and the one word register 163 to the counter A, a scan of vocabulary informations stored in the group of the counter A is made in sequence. When a coincidence is reached with vocabulary information "(T5A3S2Y3)" having the Japanese code "[party head]" as additional information, it is retrieved. Though the vocabulary information storage circuit 166 may store such clause "(T5A3S2Y3)" having the Japanese code "Ipitcherl" it is not retrieved since it is listed under the classification of "sports terms" which is different from the specified classification.
The selection utilizing the meaning information will be described below with reference to "(G4NoS2Y5A3 (H4R3))" corresponding to "(GENSYOO (HERU))" of the input sentence shown in Fig. 2a. When the clause "(G4NoS2Y > A3 (H4R3))" is passed from the input register 161 of the vocabulary memory 33 through the O determination circuit 162 and one word register 163 to the counter A of the vocabulary information storage circuit 166, the counter A is already fed with meaning information "(H4R3)" so that it scans vocabulary informations while incrementing, to see if a coincidence is reached between the input clause and the group of vocabulary informations for both the index word and the meaning information. Suppose that the group of vocabulary informations corresponding to the counter A include two words for "(G4NoS2YsA3)", one having the Japanese code "[reduction]" and the other having the Japanese code "[phenomenonl". Of these two vocabulary informations, the coincidence for both the index word and the meaning information is reached only for "(G4N0S2Y5A3)" (reduction), which is therefore retrieved. In this manner, a selection and retrieval of a correct word can be positively made by utilizing the meaning information even though words of the same phonation may be stored within the same classification.
The selection utilizing the composite word construction will be described with reference to "(KsK3S A2 K2KrNo)" (international organization). When "(KsK3S,A2)" is inputted to the word splitting circuit 157 and the vocabulary memory 33 of Fig. 16, it is assumed that tive vocabulary informations are retrievable from the vocabulary information storage circuit 166 of the memory 33, as illustrated in Table 9. These vocabulary informations are fed to the composite word determination circuit 165 shown in Fig. 17, and they are temporarily stored therein whenever any particular vocabulary informations contains a post index indicating the possibility of forming a composite word. Then the next clause "(K2KrNo)" is fed from the kana clause input circuit 155 of Fig. 16 to the word splitting circuit 157 and the vocabulary memory 33. The clause "(K2KrNo)" is passed through the input register 161, the O determination circuit 162 and the one word register 163 of Fig. 17 to the composite word determination circuit 165, where determination is made whether "(K2KXNo)" can form a composite word with "(KsK3S,A2)". Since there is a vocabulary information among five of them which has a post index "(K2K1No)" in coincidence with the post index word "(K,K,N,)" of "(KsK3S1A2)", this circuit determines that a composite word "(K5K,S1A2 K2K1N (international organization) can be formed. Then the remaining four homonyms or vocabulary informations are rejected.
(2) Independent word/adjunctive catenation examining section and grammatical information memory (Fig. 18) The independent word/adjunctive catenation examining section 34 and the grammatical information memory 35 include a first and second input register 171, 173, KX determination circuit 172, vocabulary information determination circuit 174, an independent word/adjunctive catenation examining circuit 175 and an output register 177. The specific circuit arrangement of the grammatical information memory 35 is similar in principle to the vocabulary memory 33 mentioned above, but stores a number of adjunctive informatons and part of speech informations. The independent word/adjunctive catenation examining circuit 175 examines whether or not an independent word is catenable with an adjunctive and also provides a temporary storage of the independent word.
Referring to the hiragana clause "M T2G,A2N,A2)' chosen in the previous example, the absence of"KX" in this clause causes it to be passed from the first input register 171 through the KX determination circuit 172 and the second register 173 to the vocabulary information determination circuit 174. The vocabulary information for "(M,T2G,A2)" which has been processed in the preceding stage is subjected to a determination by the circuit 174 before being fed to the independent word/adjunctive catenation examining circuit 175 while the remaining "(N1A2)" is fed from this circuit to the grammatical information memory 35. The scan of the grammatical information memory 35 is made for "(N,A2)", which however cannot be retrieved. Therefore, a scan is made for "(N,)". Again the scan fails, so that the grammatical information memory 35 returns "(N1A2)" to the third input register 153 of the word splitting section 32.
At this time, the vocabulary information for "(M,T2GIA2)" which has been temporarily stored in the examining circuit 175 is fed through the output register 177 to the next succeeding section 36. On the other hand, "(N1A2)" is returned through the third input register 153 to the second input register 152 of Fig. 16 while an input from KX determination circuit 172 to the second input register 173 is temporarily interrupted. "(N1A2)" fed back to the second input register 152 is thence fed to the word splitting circuit 157 and the vocabulary memory 33, and a search of the vocabulary information storage circuit 166 of the latter is made to retrieve the vocabulary information for "(N1A2)", which is then fed to the word conjugating circuit 159. Since it is unnecessary to split "(N1A2)", the transmission to the word splitting circuit 157 is omitted. The vocabulary information for "(N1A2)" supplied to the circuit 159 is then fed through the output register 1510 to the first input register 171 of Fig. 18. Subsequently, it is determined as representing vocabulary information by the vocabulary information determination circuit 174, after it has been passed through the circuits 172 and 173, and is subsequently fed to the independent word/adjunctive catenation examining circuit 175. Since there is no adjunctive which is catenable with "(N,A2)", it is not subjected to any processing and is directly fed to the output register 177.
(3) Independent word/non-adjunctive conjugation examining section (Fig. 19) This section comprises a first and a second input register 181, 182, an independent word/non-adjunctive conjugation examining circuit 183, an output register 185 and an additional information deletion circuit 186. As mentioned previously, the grammatical information memory 35 stores a group of part of speech information having various kinds of parts of speech as index words. The examining circuit 183 operates to see if the conjugation between an independent word and any word other than an adjunctive is possible from the standpoints of part of speech information and variety information. This circuit includes a O mark addition circuit. The addition of O mark takes place automatically one character before the end of a word whenever the examining circuit 183 has determined that the conjugation is impossible.
Considering the clause "(M1T2G1A2N1A2)", it is supplied from the output register 177 of Fig. 18 through the first and second input registers 181,182 of Fig. 19 to both the examining circuit 183 and the grammatical information memory 35. The vocabulary information "(N,A2)" is also fed to both of these circuits. The part of speech and variety information are retrieved from the grammatical information memory 35 for both "(MIT2G,A2)" and "(N,A2)", and are fed to the catenation examining circuit 183 to see if"(M1T2G1A2)" can be conjugated with "(N1A2)". As mentioned previously, these clauses cannot be conjugated together. For this reason, O mark addition is made to provide "(MlT2GlOA2)", and the vocabulary information of "(M1T2G1A2)" and "(N1A2)" is fed to the deletion circuit 186, which deletes the additional information only leaving the index word. In addition, it combines the both index words together to provide "(M1T2G1A2N1A2)" as a single clause output, which is then fed through the fourth input register 154 to the second input register 152 of Fig. 16 again. In the circuit of Fig. 16, "(M1T2G1A2)" was initially scanned, and therefore to avoid a repetition of the same result if the clause "(MlT2GlA2NlA2)" is used for the scan, the mark "O' is added to permit a scan for "(M1T2G1)".
At this time, the input register 152 temporarily interrupts an input from KX determination circuit 172. The clause is fed from the input register 152 to the word splitting circuit 157 and the vocabulary memory 33. When the clause "(M lT2GlOA2NlA2)" is supplied through the input register 161 to the O determination circuit 162, the latter detects the mark "O", and removing the symbol and characters which follows "(MlT2Gl)", it supplies the clause "(MlT2Gl)" through the one word register 163 to the counter l70ofthe vocabulary information storage circuit 166. The storage circuit 166 operates to increment the counter 170 sequentially, to see if a coincidence of index word is found. When a coincidence is found, the corresponding vocabulary information is passed through the composite word determination circuit 165 and the second output register 168 to the word conjugating circuit 159. The counter 170 is then reset and the information retrieved is fed to the one word register 163 to supply "(M,T,G,)" to the word splitting circuit 157 through the first output register 167. The word splitting circuit 157 of Fig. 16 operates to split the original clause into "(M1T2G1)" and "(A2N1A2)", removing the mark "O", and feeds "(A2N1A2)" alone to the word combining circuit 159 where the vocabulary information for "(MlT2Gl)" is combined with "(A2N1A2)", with the combination being fed through the output register 1510 to the first input register 171 of Fig. 18. It is thence fed through KX determination circuit 172 and second input register 173 to the vocabulary information determination circuit 174. "(M,T2G,)" in the vocabulary information is supplied to the independent word/adjunctive examining circuit 175 while "(A2N,A2)" is supplied to the grammatical information memory 35.
A scan of a group of adjunctive information stored in the grammatical information memory 35 Is made by incrementing the counter, and if no coincidence is found, the last character is removed to re-initiate a scan for "(A2N,)". If an adjunctive information for "(A2N )" is found, it is fed to the examining circuit 175.
Deriving corresponding part of speech information from the grammatical information memory 35 using the part of speech information of "(M,T2G,)" and "(A2N,)" as index words, the examining circuit 175 checks if they are catenable on the basis of the part of speech information. In the present example, the catenation is possible. Then the adjunctive information for "(A2)" is retrieved from the grammatical information memory 35 and is fed to the examining circuit 175. Two part of speech information of the same kind which are retrieved from the grammatical information memory 35 for "(A2N1)" and "(A2)" are used to ex
Considering the word construction of the both clauses, the first clause comprises (an independent word+an adjunctive word) combination while the second (a postword+a postfix). However, a scan of the word construction storage circuit 193 fails to find a coincidence with any one of reference word constructions stored therein.
When the scan fails in this manner, only the initial clause is left in the original form while the next clause is converted into hiragana. Thus, the clause "(K1K1R2T1A2)" is fed to the hiragana conversion circuit 194 where the kanji's of the Japanese code which are used contained in the vocabulary information of the two words (KlKlR2)'' and "(T,A2)" are converted into hiragan's of the Japanese code so that "(K,K,R2T,A2)" can be outputted in hiragana form when a kanji and kana compounded sentence is formulated in the next succeeding kanji conversion section 310.
(5) Concept information examining section (Fig. 21) The concept information examining section 38 comprises an input register 201, a phrase temporary storage circuit 202, a concept information determination circuit 203, a vocabulary or adjunctive information temporary storage circuit 204 and an output register 205. The phrase temporary storage circuit 202 stores an input sentence in unit of a phrase including the period while the word information temporary storage circuit 204 temporarily stores each vocabulary or adjunctive information inputted. The concept information determining circuit 203 functions to determine I a coincidence can be found for the concept information contained in each vocabulary or adjunctive information.
An exemplary operation will be described below with reference to "(SNsB ,D, S4K2Y3...T,A3G2 S1R4T5 o)" corresponding to "SONOBADE, SEKIYU.. TOOGI SARETA." appearing in the ninth to thirteenth lines of the sentence shown in Fig.
4a. To help understanding the fundamental principle, it is assumed that although each vocabulary or adjunctive information inherently contains concept informations, only the vocabulary or adjunctive information of those words which appear in the following description contains such concept information. Thus, it is assumed that only "(H,S2)" and (W,T,R3)" contain concept information. When the clause "(SsNsB,D,)" is supplied through the input register 201 to the storage circuits 202 and 204, the index word thereof is stored in the storage circuit 204.
Each clause is stored in the storage circuit 202 and the index word of each clause is temporarily stored in the circuit 204. Subsequently, the individual words of each clause temporarily stored in the storage circuit 202 are fed to the concept information determination circuit 203 in a sequential manner, and those words not containing concept information are delivered to the output register 205. When the clause "(H,S2)" which has concept information added thereto is fed from the storage circuit 202 to the concept information determination circuit 203, a number of index words stored in the storage circuit 204 are sequentially fed to the determination circuit 103 to see if there is a coincidence with the concept information of "(H1S2)".
If there are three homonyms for "(H1S2)" as illustrated in Table 16, they are all fed to the concept information determination circuit 203. An index word derived from the vocabulary or adjunctive information temporary storage circuit 204 is sequentially compared against the concept information of three homonyms "(H1S2)" to find a coincidence, which can be found only for that "(H,S2)" having the Japanese code of "[bridgel". Thus the two homonym "(H1S2)" words for (chop sticks) and (end) are rejected, and the word "(H1S2)" (bridge) is fed to the output register 205. This is because the concept information for "(H,S2)" (bridge) is to ''(Y,T1R3)'' (cross) and reaches a coincidence with the index word "(Y,T,R3)" stored in the storage circuit 204, while concept information of "(H,S2)" (chop stick) is to "(M > T3)" (hold) and "(H1S2)" (end) has no concept information. In this manner, "(HaS2)" (bridge) can be correctly selected from among three homonyms in correspondence to "(Y,TXR3)", by utilizing the concept information.
(6) Two word determination section (Fig. 22) This section 39 comprises an input register 211, a two word determination circuit 212, a tautologic style addition circuit 213 and an output register 214. The two word determination circuit 212 function to leave only two words and delete the third and subsequent words whenever more than two homonyms are supplied. The tautologic style addition circuit 213 adds "[" and "1" (square brackets) of the Japanese code before and after each of two homonyms and to add a demarcation mark "" of the Japanese code between the both homonyms.
The operation of this section will be specifically described with reference to "(T,A2S4A2)" corresponding to "TAISEI" appearing in the tenth line of the sentence shown in Fig. 4a. Assuming that the circuits which precede the two word determination section 212 failed to make a proper choice from among homonyms and the clause "(T,A2S4A2)" is passed through the input register 211 to the determination circuit 212 of Fig. 22, it supplies two homonyms for "(T,A2S4A2)" which are in turn supplied to the tautologic style addition circuit 213. The latter adds "[" and "1" of the Japanese code before and after "(T1A2S4A2)" (attitude) and "(T,A2S4A2)" (system) each having additional information, and enters (demarcation mark) of the Japanese code between the both words. If three homonyms for "(T,A2S4A2)" are supplied to the circuit of Fig. 22, the first two words are employed while the third word is rejected. A word for which there is no homonym is directly passed to the output register 214 without being processed in any manner in the two word determination circuit 212.
(7) Kanji conversion section (Fig. 23) This conversion section 310 comprises an input register 221, KX determination circuit 222, a Japanese code determination circuit 223, a VX, LX, and NX identification circuit 224, a Japanese code conversion circuit 225 and an output register 226. KX determination circuit 222 detects a word with "KX" added thereto and delete "KX", supplying only the Japanese code of the word directly to the output register 226. The Japanese code determination circuit 223 supplies only the Japanese code of each word to the output register 226, deleting all other additional information inclusive of the index word. The VX, LX and NX identification circuit 224 detects a clause with either one of"VX", "LX" and "NX". The Japanese code conversion circuit 225 converts a word having no Japanese code into a Japanese code. There are clauses and words to which "KX", "VX", "LX" or "NX" is added or having a Japanese code contained in the additional information associated with the index word. The addition of "KX" to a word is detected by KX determination circuit 222, which deletes "KX", feeding the Japanese code alone to the output register 226. A vocabulary or adjunctive information is detected by the Japanese code determination circuit 223, which only retains the Japanese code in the additional information to be fed to the output register 226, deleting all other additional information. A word added with "VX", "LX" or "NX" is fed through KX determination circuit 222 and Japanese code determination circuit 223 to the VX, LX and NX identification circuit 224, which identifies these functional words and supplies the word together with the identification information to the Japanese code conversion circuit 225. As an example of a word added with "VX", the clause "PGAVX" appearing in the tenth line of the input sentence shown inFig. 3a is converted by this circuit into "PGA", deleting "VX". A word with "LX" is exemplified by "ENERUGIILX" appearing in the sixth line of the sentence shown in Fig. Ia, and is converted by the circuit 225 into a katakana spelling of the Japanese code corresponding to "ENERGY", deleting "LX". A word with "NX" is exemplified by "IONX" appearing in the sixth line of the sentence shown in Fig.
3a, and is converted by the circuit 225 into "10" of the Japanese code (however in the form of consecutive figures), deleting "NX". A word which is represented in hiragana of the kana code without any additional information until it reaches the kanji conversion section 310 is converted into a hiragana of the Japanese code in the Japanese code conversion circuit 225. By the procedure mentioned above. the clauses are outputted in the form of a kanji and kana compounded sentence of Japanese code from the output register 226 to the output section 40.
When converting into a kanji and kana compounded sentence, "[" and "I" may be used as in a telegram by storing in the vocabulary information storage circuit 166 of Fig. 17, vocabulary informating having index words "(K1G2K1T0K5)" and "(K1G2T5D2)" for the clauses "(K1G2K1T0K5)" and "(K1G2T5D2)" corresponding to "KAGIKAKKO" and "KAGITOJI" appearing in the seventh and eleventh lines of the sentence shown in Fig. 3a, as tabulated in Table 17.
7. Output stage Referring to Fig. 24, the output stage 40 comprises an input register 231, an output telegram formulation circuit 232, a monitor output printer 233 and a tape punch 234. Upon receiving signals in unit of clauses from the kana-to-kanji conversion stage 30 through the input register 231, the formulation circuit 132 formulates the header 2, text 3 and trailer 4 of the input sentence into a given output telegram form. Feeding the monitor section 233 from the circuit 231 provides a kanji and kana compounded output telegram while feeding the tape punch 234 provides a kanji and kana compounded sentence in the form of a tape perforated in accordance with the kanji code.
8. Modification While the invention has been illustrated and described specifically with respect to a particular embodiment, it should be understood that it is exemplary only and not limitative of the scope of the invention. In particular, the application of the system according to the invention is not limited to the press as described in connection with the embodiment, but the system may be equally used in the translation of general documents of various kinds including diplomatic and business letters. It is within the skill of the art to modify the format of the input and/or output sentences or to change or remove part of the arrangement dependent on the intended differential applications. For example, the specific cipher and/or specific word may be processed by retrieving corresponding vocabulary information from the vocabulary memory in the similar manner as for other clauses or words, without being converted into final or intermediate output word in the input stage. In this modification, the use of functional words "KX", "LX" and "VX" may be avoided. In the described embodiment, a variety of determination means are used in combination to avoid an erroneous conversion process, but an inexpensive system may be provided if a degree of conversion errors are permitted.
For example, it will be noted that the word construction determination section or concept information determination section may be omitted.
On the other hand, while the embodiment described operates in a manner such that a Roman sentence of Japanese language is initially converted into a hiragana sentence which is then converted into a kanji and kana compounded sentence, such hiragana sentence may be replaced by katakana sentence since there is a one to-one correspondence between hiragana and katakana as mentioned under the paragraph of "Denotation of Japanese Language".
In order to receive a katakana, a hiragana or a Roman sentence of Japanese language in which each character is encoded by six bits and to convert it into a kanji and kana compounded sentence, the system of the invention may be associated with an adaptor as shown in Fig. 25, wherein similar numerals are used as before. Referring to Fig. 25, the adaptor 240 comprises an off-line input channel including six bit paper tape reader 241 and a read-out circuit 242, and an on line input channel including a reception circuit 248 and a serial-to-parallel conversion circuit 245. Each of the input channel can be selectively connected through a selection switch S, or S2 with a first conversion circuit 243, second conversion circuit 246 and third conversion circuit 247, the output of which is fed through a common output register 244 to the message formatting section 12. In the example shown, each conversion circuit outputs a signal indicative of a Roman alphabet according to the pseudo internatioal code, using six bits per character, which is formed by uniform one bit addition to the international code which uses five bits per character. However, the input to the respective conversion circuits may be different. By way of example, the first conversion circuit 243 may receive a katakana encoded by six bits per character, the second conversion circuit 246 a hiragana encoded by six bits per character, and third conversion circuit 247 a signal indicative of a Roman alphabet which is encoded by six bits per character according to a coding scheme which is different from the international or pseudo international code.
The conversion process in the respective conversion circuits may be achieved by scanning a suitable memory for retrieval of an output character or characters corresponding to an input character. With the adaptor 240, when inputting a signal of a particular encoding system to a selected input channel, the selection switch S1 or S2 of that channel may be operated to supply the signal to the input of either conversion circuit 243, 246 or 247 which outputs a desired encoding system. The output register 244 provides an output signal of the pseudo international code to the message formatting section 12. The subsequent operation takes place in the similar manner as mentioned above.
WHAT IS CLAIMED IS: I. A system for converting a Roman sentence of Japanese language into a kanji and kana compounded sentence usually used in Japan, the Roman sentence comprising a series of Roman alphabet characters for each clause of the phonation of a Japanese language, with a space corresponding to at least one character
**WARNING** end of DESC field may overlap start of CLMS **.

Claims (7)

  1. **WARNING** start of CLMS field may overlap end of DESC **.
    provides a kanji and kana compounded output telegram while feeding the tape punch 234 provides a kanji and kana compounded sentence in the form of a tape perforated in accordance with the kanji code.
    8. Modification While the invention has been illustrated and described specifically with respect to a particular embodiment, it should be understood that it is exemplary only and not limitative of the scope of the invention. In particular, the application of the system according to the invention is not limited to the press as described in connection with the embodiment, but the system may be equally used in the translation of general documents of various kinds including diplomatic and business letters. It is within the skill of the art to modify the format of the input and/or output sentences or to change or remove part of the arrangement dependent on the intended differential applications. For example, the specific cipher and/or specific word may be processed by retrieving corresponding vocabulary information from the vocabulary memory in the similar manner as for other clauses or words, without being converted into final or intermediate output word in the input stage. In this modification, the use of functional words "KX", "LX" and "VX" may be avoided. In the described embodiment, a variety of determination means are used in combination to avoid an erroneous conversion process, but an inexpensive system may be provided if a degree of conversion errors are permitted.
    For example, it will be noted that the word construction determination section or concept information determination section may be omitted.
    On the other hand, while the embodiment described operates in a manner such that a Roman sentence of Japanese language is initially converted into a hiragana sentence which is then converted into a kanji and kana compounded sentence, such hiragana sentence may be replaced by katakana sentence since there is a one to-one correspondence between hiragana and katakana as mentioned under the paragraph of "Denotation of Japanese Language".
    In order to receive a katakana, a hiragana or a Roman sentence of Japanese language in which each character is encoded by six bits and to convert it into a kanji and kana compounded sentence, the system of the invention may be associated with an adaptor as shown in Fig. 25, wherein similar numerals are used as before. Referring to Fig. 25, the adaptor 240 comprises an off-line input channel including six bit paper tape reader 241 and a read-out circuit 242, and an on line input channel including a reception circuit 248 and a serial-to-parallel conversion circuit 245. Each of the input channel can be selectively connected through a selection switch S, or S2 with a first conversion circuit 243, second conversion circuit 246 and third conversion circuit 247, the output of which is fed through a common output register 244 to the message formatting section 12. In the example shown, each conversion circuit outputs a signal indicative of a Roman alphabet according to the pseudo internatioal code, using six bits per character, which is formed by uniform one bit addition to the international code which uses five bits per character. However, the input to the respective conversion circuits may be different. By way of example, the first conversion circuit 243 may receive a katakana encoded by six bits per character, the second conversion circuit 246 a hiragana encoded by six bits per character, and third conversion circuit 247 a signal indicative of a Roman alphabet which is encoded by six bits per character according to a coding scheme which is different from the international or pseudo international code.
    The conversion process in the respective conversion circuits may be achieved by scanning a suitable memory for retrieval of an output character or characters corresponding to an input character. With the adaptor 240, when inputting a signal of a particular encoding system to a selected input channel, the selection switch S1 or S2 of that channel may be operated to supply the signal to the input of either conversion circuit 243, 246 or 247 which outputs a desired encoding system. The output register 244 provides an output signal of the pseudo international code to the message formatting section 12. The subsequent operation takes place in the similar manner as mentioned above.
    WHAT IS CLAIMED IS: I. A system for converting a Roman sentence of Japanese language into a kanji and kana compounded sentence usually used in Japan, the Roman sentence comprising a series of Roman alphabet characters for each clause of the phonation of a Japanese language, with a space corresponding to at least one character
    between adjacent clauses, the system comprising first means for producing a first encoded signal representing the Roman sentence, second means for converting the first encoded signal into a second encoded signal representing a kana sentence which corresponds to the Roman sentence, third means for converting the second encoded signal into a third encoded signal representing a kanji and kana compounded sentence which corresponds to the kana sentence, said third means evaluating if the construction of each clause of the kana sentence represents a combination of a single independent word or a selected number of independent words with a selected number of adjunctives to add grammatical information of the words and/or adjunctives as well as the third encoded signal representing the words and/or adjunctives, thereby examining if the evaluation of the clause construction is grammatically acceptable on the basis of the grammatical information, said third means repeating the evaluation of the clause construction whenever the examination indicates a grammatically unacceptable result until a proper clause construction is detected, whereupon the clause and grammatical information are deleted to leave the third encoded signal alone, and fourth means for outputting the third encoded signal.
  2. 2. A system according to Claim 1 in which the first encoded signal is encoded by six bits per character, the second encoded signal is encoded by six bits per character, and the third encoded signal is encoded by two sets of six bits per character, and further including means for receiving an input encoded signal indicative of a Roman sentence which is the duplication of the Roman sentence and encoded by five bits per character, said first means converting the input encoded signal into the first encoded signal.
  3. 3. A system according to Claim I in which the first encoded signal is encoded by six bits per character, the second encoded signal is encoded by six bits per character, and the third encoded signal is encoded by two sets of six bits per character, and further including means for receiving an input encoded signal indicative of a katakana sentence which is in accordance with the Roman sentence and encoded by six bits per character, said first means converting the input encoded signal into the first encoded signal.
  4. 4. A system according to Claim I in which the first encoded signal is encoded by six bits per character, the second encoded signal is encoded by six bits per character, and the third encoded signal is encoded by two sets of six bits per character, and further including means for receiving an input encoded signal indicative of a hiragana sentence which is in accordance with the Roman sentence and encoded by six bits per character, said first means converting the input encoded signal into the first encoded signal.
  5. 5. A system according to Claim 1 in which the first encoded signal is encoded by six bits per character, the second encoded signal is encoded by six bits per character, and the third encoded signal is encoded by two sets of six bits per character, and further including means for receiving an input encoded signal indicative of a Roman sentence which is the duplication of the Roman sentence and encoded by six bits per character, said first means converting the input encoded signal into the first encoded signal.
  6. 6. A system according to Claim I in which said third means for converting the second encoded signal into the third encoded signal comprises first memory means storing a number of vocabulary informations each including an index word formed by a respective word, grammatical information of the word and the thrid encoded signal representing the word, means for splitting the kana sentence into clauses, means for retrieving from the first memory means vocabulary information having an index word which corresponds fully or partly to a character train which constitutes a clause in a sequential manner with respect to each clause. means for adding the vocabulary information to the second encoded signal indicative of the word which is used for retrieval from the first memory means, second memory means storing a number of adjunctive informations each including an index word formed by a respective adjunctive, grammatical information of the adjunctive and the third encoded signal representing the adjunctive, means for retrieving from the second memory means the adjunctive information having an index word which corresponds fully or partly to a character train which constitutes any remainder of the clause other than the character train constituting the word, means for adding the adjunctive information to the second encoded signal indicative of the adjunctive which is used for retrieval from the second memory means, means operative whenever retrieval of the adjunctive information from the second memory means fails to retrieve from the first memory means vocabulary information having an index word which corresponds fully or partly to the character train of the remainder of the clause, third memory means storing a number of part of speech informations each including an index word formed by each part of speech, part of speech information indicating those parts of speech which are grammatically catenable with the part of speech chosen as the index word and word variety information, first examining means for examining the catenation capability between a word and its adjoining adjunctive in a clause on the basis of the part of speech information stored in the third memory means, second examining means for examining the conjugation capability between two adjoining words in a clause on the basis of the part of speech information stored in the third memory means, and means for replacing a word portion and an adjunctive portion of a clause by the third encoded signals contained in their associated vocabulary and adjunctive informations, respectively.
  7. 7. A system according to Claim 6 in which said third means further comprises fourth memory means storing word construction patterns of the Japanese language, and means interposed between the second examining means and the replacing means for comparing a combination of word portion and adjunctive portion of a clause against the construction patterns stored in the fourth memory means.
GB34030/77A 1977-09-19 1977-09-19 Translation system Expired GB1596411A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB34030/77A GB1596411A (en) 1977-09-19 1977-09-19 Translation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB34030/77A GB1596411A (en) 1977-09-19 1977-09-19 Translation system

Publications (1)

Publication Number Publication Date
GB1596411A true GB1596411A (en) 1981-08-26

Family

ID=10360496

Family Applications (1)

Application Number Title Priority Date Filing Date
GB34030/77A Expired GB1596411A (en) 1977-09-19 1977-09-19 Translation system

Country Status (1)

Country Link
GB (1) GB1596411A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3248384A1 (en) * 1982-01-20 1983-07-28 Voest-Alpine AG, 1011 Wien DEVICE FOR DRYING SOLIDS
EP0091317A2 (en) * 1982-04-07 1983-10-12 Kabushiki Kaisha Toshiba Syntax analyzing method and apparatus
EP0180888A2 (en) * 1984-10-29 1986-05-14 Hitachi, Ltd. Method and apparatus for natural language processing
US4644492A (en) * 1979-10-05 1987-02-17 Canon Kabushiki Kaisha Plural mode language translator having formatting circuitry for arranging translated words in different orders
US5220503A (en) * 1984-09-18 1993-06-15 Sharp Kabushiki Kaisha Translation system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4644492A (en) * 1979-10-05 1987-02-17 Canon Kabushiki Kaisha Plural mode language translator having formatting circuitry for arranging translated words in different orders
DE3248384A1 (en) * 1982-01-20 1983-07-28 Voest-Alpine AG, 1011 Wien DEVICE FOR DRYING SOLIDS
EP0091317A2 (en) * 1982-04-07 1983-10-12 Kabushiki Kaisha Toshiba Syntax analyzing method and apparatus
EP0091317A3 (en) * 1982-04-07 1984-02-29 Kabushiki Kaisha Toshiba Syntax analyzing method and apparatus
US4586160A (en) * 1982-04-07 1986-04-29 Tokyo Shibaura Denki Kabushiki Kaisha Method and apparatus for analyzing the syntactic structure of a sentence
US5220503A (en) * 1984-09-18 1993-06-15 Sharp Kabushiki Kaisha Translation system
EP0180888A2 (en) * 1984-10-29 1986-05-14 Hitachi, Ltd. Method and apparatus for natural language processing
EP0180888A3 (en) * 1984-10-29 1986-08-27 Hitachi, Ltd. Method and apparatus for natural language processing
US5109509A (en) * 1984-10-29 1992-04-28 Hitachi, Ltd. System for processing natural language including identifying grammatical rule and semantic concept of an undefined word

Similar Documents

Publication Publication Date Title
D. Becker Multilingual word processing
JP2000514218A (en) Word recognition of Japanese text by computer system
CN100462901C (en) GB phoneticize input method
CN103034625A (en) System and method for detecting and correcting mismatched Chinese character
GB1596411A (en) Translation system
Nandasara et al. Bridging the digital divide in Sri Lanka: some challenges and opportunities in using Sinhala in ICT
CN105511636B (en) Improved whole Chinese character Chinese word simply unifies input method without repeated code
JPS5822767B2 (en) Japanese typewriter
Li et al. The study of comparison and conversion about traditional Mongolian and Cyrillic Mongolian
US7539611B1 (en) Method of identifying and highlighting text
CN104615269A (en) Tibetan and Latin complete-short-form binary-syllabification encoding scheme and intelligent input system thereof
KR20090042201A (en) Method and apparatus for automatic extraction of transliteration pairs in dual language documents
Aufrecht Ujjvaladatta's Commentary on the Uṇādisūtras: Edited from a Manuscript in the Library of the East India House
King Functions required of a translation system
JPS6133569A (en) &#34;kana&#34;/&#34;kanji&#34; converter
DE2741822C2 (en)
CN114185441A (en) Simple coincident code-free universal input method for all Chinese words
JP3048793B2 (en) Character converter
Tumasonis Encoding of Lithuanian accented letters
Chang et al. A Statistical Approach to Automatic Phonetic Transcription of Japanese Orthographic Words
Jiang The current status of sorting order of Tibetan dictionaries and standardization
JPH0361219B2 (en)
JPH0334058A (en) Word sound punctuation system and word sound/kanji conversion system in language using kanji
JPS61114366A (en) Correction processing system of japanese word text data
CN111581991A (en) Han blindness translation method and system based on end-to-end neural machine translation

Legal Events

Date Code Title Description
PS Patent sealed
PCNP Patent ceased through non-payment of renewal fee

Effective date: 19960919