NL8702359A

NL8702359A - LANGUAGE ANALYSIS DEVICE.

Info

Publication number: NL8702359A
Application number: NL8702359A
Authority: NL
Original assignee: Ricoh Kk
Priority date: 1986-10-03
Filing date: 1987-10-02
Publication date: 1988-05-02
Also published as: FR2604814B1; FR2604814A1; DE3733674A1; DE3733674C2

Description

*'. } 4 7 N.O. 34769 1* '. } 4 7 N.O. 34769 1

Taal analyse InrichtingLanguage analysis Device

De onderhavige uitvinding heeft betrekking op een taal analyse inrichting en heeft meer in het bijzonder betrekking op een taal analyse inrichting voor het analyseren van natuurlijke talen, welke analysator kan worden gebruikt in automatische vertaal inrichtingen.The present invention relates to a language analyzer and more particularly relates to a language analyzer for analyzing natural languages, which analyzer can be used in automatic translation devices.

5 De uit de stand der techniek bekende taal analyse inrichtingen hebben een aantal problemen zoals in het onderstaande wordt beschreven.The language analysis devices known from the prior art have a number of problems as described below.

Bij het analyseren van morfemen in een zin, bijvoorbeeld woorden enz., is het belangrijk om te beoordelen of een bepaald woord alleen 10 wordt gebruikt of dat het wordt gebruikt als een samengesteld woord of in een uitdrukking waar een koppeling met een ander woord of met andere woorden aanwezig is, en dit heeft ook een belangrijke invloed op het resultaat van de analyse.When analyzing morphemes in a sentence, for example words etc., it is important to assess whether a particular word is used alone or whether it is used as a compound word or in an expression where it is linked with another word or with in other words, and this also has a significant impact on the outcome of the analysis.

De conventionele analysatoren maken gebruik van een stelsel voor 15 het uitvoeren van de analyse zowel in gevallen waarin opeenvolgende woorden worden verondersteld deel uit te maken van een uitdrukking als ook bij onafhankelijke woorden en uiteindelijk wordt een geschikte vertaling gekozen gebaseerd op het resultaat daarvan of er wordt een stelsel gebruikt waarin dergelijke opeenvolgende woorden bij voorkeur als 20 een uitdrukking worden beoordeeld. Het eerstgenoemde stelsel vergt een lange verwerkingstijd terwijl het laatstgenoemde stelsel een hoge kans heeft om een foutieve vertaling te vervaardigen.Conventional analyzers employ a system for performing the analysis both in cases where consecutive words are assumed to be part of an expression as well as in independent words and ultimately an appropriate translation is selected based on the result thereof or uses a system in which such consecutive words are preferably judged as an expression. The former system requires a long processing time while the latter system has a high chance of producing an incorrect translation.

Bij morfologische analyse, uitgevoerd bij het genereren van een vertaling, wordt een zinsdeel of soortgelijke informatie van het mor-25 feem, zoals een woord, verkregen aan de hand van een woordenlijst. Omdat de gebruikelijke woorden zoals zelfstandige naamwoorden en werkwoorden voor het merendeel opgenomen kunnen worden in een woordenlijst kunnen ze gemakkelijk worden opgezocht om informatie te verkrijgen.In morphological analysis, performed when generating a translation, a phrase or similar information of the morpheme, such as a word, is obtained from a glossary. Because the majority of common words such as nouns and verbs can be included in a glossary, they can be easily looked up for information.

Omdat uitdrukkingen zoals bijvoorbeeld aanduidingen voor lengte 30 (m), snelheid (m/s), versnelling (m/s^) en andere eenheden echter in allerlei verschillende soorten voorkomen is het niet efficient om deze allemaal in de woordenlijst op te nemen omdat daardoor op een verspillende wijze de geheugencapaciteit voor het opslaan van de woordenlijst-informatie wordt vergroot.However, because expressions such as indications for length 30 (m), speed (m / s), acceleration (m / s ^) and other units exist in many different types, it is not efficient to include them all in the glossary because the memory capacity for storing the glossary information is increased in a wasteful manner.

35 Bij morfologische analyse van een zin behoeft de numerieke uit drukking in een bepaalde taal niet altijd overeen te stemmen met die in andere talen. Het fundamentele concept voor het tellen met getallen, dat wil zeggen de positionele motatie, is bijvoorbeeld verschillend in 8702356 2 Λ i f % de Japanse taal en in Europese talen zoals de Engelse taal.35 In a morphological analysis of a sentence, the numerical expression in a given language need not always correspond to that in other languages. For example, the basic concept of counting with numbers, that is, the positional motion, is different in 8702356 2 Λ i f% the Japanese language and in European languages such as the English language.

Als bijvoorbeeld een Engelse uitdrukking voor een numerieke waarde "a hundred and two thousand two hundred and four" eenvoudig wordt ontbonden in de samenstellende delen ervan en louter en alleen wordt ver-5 vangen door de respectievelijke corresponderende uitdrukkingen in de Japanse taal dan zou de analyse daarin luiden "100 en 2200 en 4". Dit zou uiteindelijk correct vertaald moeten worden als "102.204", dat wil zeggen "a hundred and two thousand and four" in de Japanse taal. In de morfologische analyse die bij een vertaling wordt uitgevoerd wordt een 10 ingevoerde zin verdeeld in woorden!ijstverwijzingseenheden door middel van begrenzingskenmerken zoals spaties, komma's en punten, en de woordenlijst wordt door de woorden!ijstverwijzingseenheden aangesproken teneinde een zinsdeel informatie en andere informatie te verkrijgen. In zo'n geval kan bijvoorbeeld een eigennaam worden gebruikt in twee of 15 meer verschillende betekenissen afhankelijk van de context. De uitdrukking "Osaka City" wordt bijvoorbeeld gebruikt om een groep als onderwerp aan te duiden zoals bijvoorbeeld in "Osaka City heeft decoded..." en ook in de betekenis waarin het gaat om de plaats als object zoals in ".... in Osaka City". Omdat tot nu toe slechts een betekenis is opge-20 borgen voor elk van de eigennamen kan met deze verschillen in betekenis geen rekening worden gehouden hetgeen de ontleednauwkeurigheid tijdens de analyse reduceert.For example, if an English expression for a numerical value "a hundred and two thousand two hundred and four" is simply decomposed into its constituent parts and is merely replaced by the respective corresponding expressions in the Japanese language, then the analysis would it reads "100 and 2200 and 4". This should eventually translate correctly as "102.204", ie "a hundred and two thousand and four" in the Japanese language. In the morphological analysis performed on a translation, an entered sentence is divided into word list reference units by means of boundary attributes such as spaces, commas, and periods, and the word list is addressed by the words list reference units to obtain a phrase information and other information. . For example, in such a case, a proper noun can be used in two or 15 more different meanings depending on the context. For example, the expression "Osaka City" is used to denote a group as a subject as, for example, in "Osaka City has decoded ..." and also in the meaning of the place as an object as in ".... in Osaka City ". Since so far only one meaning has been stored for each of the proper names, these differences in meaning cannot be taken into account, which reduces the parsing accuracy during the analysis.

In de Engelse zin bijvoorbeeld ".... in the Central Park John Willson had a...", kan de zin niet correct worden ontleed tenzij her-25 kend wordt dat een scheiding uitgevoerd moet worden tussen "Central Park" en John Willson" in deze context. Op dezelfde wijze moet in de Engelse zin ".... in Boston Mr. Baker was....", herkend worden dat een scheiding moet worden uitgevoerd tussen "Boston" en "Mr. Barker". In conventionele stelsels wordt een reeks van dergelijke eigennamen echter 30 verkeerd herkend als een enkele bij elkaar behorende combinatie van woorden.For example, in the English sentence ".... in the Central Park John Willson had a ...", the sentence cannot be correctly parsed unless it is recognized that a separation must be made between "Central Park" and John Willson "in this context. Likewise in the English sense" .... in Boston Mr. Baker was .... ", it should be recognized that a separation should be made between" Boston "and" Mr. Barker. ”However, in conventional systems, a series of such proper nouns is misrecognized as a single associated combination of words.

Bij het analyseren van morfemen in een Engelse zin wordt bijvoorbeeld een reeks van woorden beginnend met een hóófdletter in het algemeen geanalyseerd als een eigennaam. Als echter de woorden beginnend 35 met de hoofdletter zich voortzetten dan is het niet altijd juist om deze als geheel als een enkele eigennamen uitdrukking te herkennen. Er zijn gevallen waarin ze in werkelijkheid een aantal eigennaamsuitdruk-kingen vormen die soms achter elkaar optreden. Bij het ontleden van morfemen tijdens het vervaardigen van een vertaling wordt zinsdeelin-40 formatie en andere informatie verkregen door opzoeken in een woorden- 870 2 3 rc '4· I 'ï 3 lijst. Omdat in zo'n geval de meeste gewone zelfstandige naamwoorden, werkwoorden enz. in de woordenlijst kunnen worden opgeslagen kunnen ze gemakkelijk worden teruggevonden om de informatie te verkrijgen.For example, when analyzing morphemes in an English sentence, a sequence of words beginning with capital letters is generally analyzed as a proper noun. However, if the words beginning with the capital letter continue, it is not always correct to recognize them as a single proper noun expression. There are cases when they actually form a number of proper noun phrases that sometimes occur consecutively. When parsing morphemes during translation translation, phrase information and other information is obtained by looking up a glossary. 870 2 3 rc '4 · I' ï 3 list. Because in such a case most common nouns, verbs etc. can be stored in the glossary, they can be easily retrieved to obtain the information.

Omdat er echter vele verschillende soorten eigennamen kunnen voor-5 komen is het onmogelijk om ze allemaal in een woordenlijst op te slaan. Die eigennamen die niet in de woordenlijst zijn geregistreerd kunnen derhalve niet als een eigennaam worden herkend.However, since there are many different types of proper nouns, it is impossible to put them all in a glossary. Therefore, those proper names that are not registered in the glossary cannot be recognized as proper names.

Als een array van karakters, gecombineerd in een bepaald patroon, aanwezig is dan is er een hoge waarschijnlijkheid dat een foutieve ont-10 leding wordt uitgevoerd indien een bewerking voor reguliere woordeenheden, die op gebruikelijke zinnen wordt toegepast, wordt toegepast voor een dergelijke array van karakters en dit kan mogelijkerwijze resulteren in een betekenisloze vertaling.If an array of characters, combined in a particular pattern, is present, then there is a high probability of erroneous parsing if a regular word unit operation, which is applied to common sentences, is applied for such an array of characters and this could potentially result in a meaningless translation.

In het geval bijvoorbeeld van de Engelse array van karakters "sun-15 day, 26 jan., '80" waarmee een tijdsbepaling wordt weergegeven, ontstaat bij vertaling in het Japans uitsluitend een reeks van zelfstandige naamwoorden en getallen zoals “zondag, 26, jan., '80".For example, in the case of the English array of characters "sun-15 day, Jan 26, '80" that displays a timeframe, translation into Japanese creates only a series of nouns and numbers such as "Sunday, 26, Jan ., '80 ".

Omdat er verder voor afgeleide woorden geen schatting wordt uitgevoerd op het zinsdeel en op de semantische aard kan er geen vertaling 20 worden verkregen die afhankelijk is van het geval.Furthermore, because derivative words are not estimated on the phrase and on the semantic nature, translation depending on the case cannot be obtained.

De inhoudsvrije grammatica de zogenaamde cfg-grammatica die wordt gebruikt voor het analyseren van een taal heeft een nadeel dat vele onnodige oplossingen worden gegenereerd die uiteindelijk niet kunnen worden gebruikt in een ontleedproces van onderaf en in een analyseproces 25 van bovenaf in de zin. Veel van deze overbodige oplossingen worden schijnbaar herkend als fouten wanneer ze in werkelijkheid worden gelezen. Omdat echter ook een structuurtransformatie of translatieformatie wordt uitgevoerd voor deze overbodige oplossingen en de juistheid van het resultaat in respectievelijke bewerkingsstappen wordt beoordeeld 30 resulteren deze in een grote hoeveelheid overbodige verwerkingstijd.The content-free grammar, the so-called cfg grammar used for analyzing a language, has the drawback that many unnecessary solutions are generated that ultimately cannot be used in a bottom parsing process and a top-down analysis process in the sentence. Many of these redundant solutions are seemingly recognized as errors when they are actually read. However, because a structure transformation or translation formation is also performed for these redundant solutions and the correctness of the result is judged in respective processing steps, they result in a large amount of redundant processing time.

Het Engelse woord "let" bijvoorbeeld heeft zowel de betekenis van een instructie als een invitatie en derhalve moet de ontleding worden uitgevoerd voor de respectievelijke mogelijkheden hetgeen de efficiëntie reduceert. Verder is het moeilijk om een ervan te selecteren.For example, the English word "let" has both the meaning of an instruction and an invitation and therefore the decomposition must be performed for the respective possibilities which reduces efficiency. Furthermore, it is difficult to select one of them.

35 Bovendien kunnen in de Engelse taal een aantal woorden door middel van koppeltekens worden gecombineerd zoals "take-care-of-him attitude" waardoor op vrije wijze een adjectiefgroep wordt samengesteld. Het is echter moeilijk om deze te behandelen met de normaal gebruikelijke ont-leedgrammatica.35 In addition, in the English language, a number of words can be combined by hyphens such as "take-care-of-him attitude" to freely form an adjective group. However, it is difficult to treat it with the usual degradation grammar.

40 Dezelfde situatie geldt ook voor additieve afvraging. Alhoewel de 8702 35 § ' t * 4 vorm van de additieve afvraging in de Engelse taal zeer beperkt is vereist ze een extreem gecompliseerde verwerking binnen de gebruikelijke analysemethoden. Het is verder niet eenvoudig om vast te stellen op welk werkwoord de additionele afvraging betrekking had.40 The same situation also applies to additive interrogation. Although the 8702 35 § 't * 4 form of the additive interrogation in the English language is very limited, it requires extremely complicated processing within the usual analysis methods. Furthermore, it is not easy to determine which verb the additional question related to.

5 De onderhavige uitvinding is gedaan in het licht van de bovenge noemde problemen.The present invention has been made in light of the above-mentioned problems.

Een eerste doelstelling van de onderhavige uitvinding is het verschaffen van een taal analyse inrichting waarmee de sterkte van koppel-ingsgraad tussen twee opeenvolgende woorden kan worden beoordeeld en 10 waarmee kan worden beoordeeld of ze een uitdrukking vormen of niet, gebaseerd op het resultaat daarvan.A first object of the present invention is to provide a language analyzer with which to assess the degree of coupling between two consecutive words and to judge whether they are an expression or not, based on the result thereof.

Een tweede doelstelling van de onderhavige uitvinding is het verschaffen van een taal analyse inrichting die in staat is tot morfologische analyse van een ingevoerde zin die een samengestel de array van ka-15 rakters bevat zoals bij dimensioned eenheden, zonder dat al deze ka-rakterarray's in een woordenlijst moet worden opgeslagen.A second object of the present invention is to provide a language analysis device capable of morphological analysis of an input sentence containing an assembly of the array of characters as in dimensioned units, without all of these character arrays in a glossary must be saved.

Een derde doelstelling van de onderhavige uitvinding is het verschaffen van een taal analyse inrichting die een geschikte morfologische analyse kan uitvoeren met betrekking tot een uitdrukking die nume-20 rieke waarden bevat.A third object of the present invention is to provide a language analyzer that can perform an appropriate morphological analysis on an expression containing numerical values.

Een vierde doelstelling van de onderhavige uitvinding is het verschaffen van een taal analyse inrichting die in staat is een eigennaam te vertalen in een betekenis die in samenhang staat met de contaxt ervan.A fourth object of the present invention is to provide a language analyzer capable of translating a proper noun into a meaning associated with its contaxt.

25 Een vijfde doelstelling van de onderhavige uitvinding is het verschaffen van een taal analyse inrichting waarmee morfemen correct kunnen worden ontleed met betrekking tot een uitdrukking die een reeks opeenvolgende eigennamen bevat.A fifth object of the present invention is to provide a language analyzer with which morphemes can be correctly parsed with respect to an expression containing a series of consecutive proper names.

Een zesde doelstelling van de uitvinding is het verschaffen van 30 een taal analyse inrichting waarmee een niet geregistreerde eigennaam kan worden verwerkt en waarmee een geschikte analyse van een eigennaam kan worden uitgevoerd rekening houdend met een relatie met woordgroepen die ervoor of erna optreden.A sixth object of the invention is to provide a language analysis device with which an unregistered proper name can be processed and with which an appropriate analysis of a proper name can be performed taking into account a relationship with phrases that occur before or after.

Een zevende doelstelling van de onderhavige uitvinding is ver-35 schaffen van een taal analyse inrichting waarmee een correcte analyse van morfemen kan worden uitgevoerd met betrekking tot een array van karakters die ene bepaalde betekenis heeft doordat ze volgens een bepaalde regel zijn gecombineerd.A seventh object of the present invention is to provide a language analyzer that allows proper analysis of morphemes with respect to an array of characters having one particular meaning by being combined according to a particular rule.

Een achtste doelstelling van de onderhavige uitvinding is het ver-40 schaffen van een taal analyse inrichting waarmee de grammaticale aard, 8702359 i f -τ 5 betekenis, enz. kan worden geschat van een woord waarvan wordt herkend dat het een afgeleide is, volgens een vooraf bepaalde regel.An eighth object of the present invention is to provide a language analyzer capable of estimating the grammatical nature, 8702359 if -τ 5 meaning, etc. of a word recognized as being a derivative, according to a predetermined certain rule.

Een negende doelstelling van de onderhavige uitvinding is het verschaffen van een taal analyse inrichting waarmee de structurele eigen-5 schappen van een ingevoerde zin kunnen worden herkend en waarmee in overeenstemming met deze eigenschappen een ontleding kan worden uitgevoerd.A ninth object of the present invention is to provide a language analyzer with which to recognize the structural properties of an entered sentence and to perform a decomposition according to these properties.

Aan het eerste doel van de uitvinding kan worden voldaan door een taal analyse inrichting, omvattende: 10 een woordenlijstgeheugen waarin woordenlijstgegevens zijn opgeslagen met inbegrip van morfeemgegevens voor woorden, samengestelde woorden en zinnen, en een ontleedinrichting voor het uitvoeren van een morfologische analyse voor een ingevoerde zin met verwijzing naar het genoemde woor-15 denlijstgeheugen, waarin het genoemde woordenlijstgeheugen gegevens bevat ter indicatie van de koppelingsgraad tussen elk van de woorden, behorend tot samengestelde woorden of zinnen en de genoemde ontleedinrichting naar het woordenlijstgeheugen ver-20 wijst voor de respectievelijke woorden die aanwezig zijn in de ingevoerde zin en, indien een aantal woordenlijstgegevens wordt gevonden voor een woord dat een combinatie vormt met andere woorden, dan wordt de combinatie van woorden met een hogere koppelingsgraad geselecteerd door te verwijzen naar de koppelingsgraadgegevens.The first object of the invention can be met by a language analyzer, comprising: a glossary memory in which glossary data is stored including morpheme data for words, compound words and phrases, and a parser for performing a morphological analysis for an input sentence with reference to said vocabulary memory, wherein said vocabulary memory contains data indicative of the degree of coupling between each of the words belonging to compound words or phrases and references said parser to the vocabulary memory for the respective words are present in the entered sentence and, if some word list data is found for a word combining with other words, then the combination of words with a higher degree of matching is selected by referring to the degree of matching information.

25 Aan het tweede doel van de uitvinding kan worden voldaan door een taal analyse inrichting omvattende: invoermiddelen voor het invoeren van een karakterarray in een vooraf bepaalde taal, een fundamentele woordenlijstgeheugen die gebruikt wordt om te 30 zoeken naar de ingevoerde karakterarray en waarin fundamentele gegevens zijn opgeborgen, en een ontleedinrichting voor het analyseren van de ingevoerde karakterarray door te zoeken in de fundamentele woordenlijst waarbij de genoemde ontleedinrichting het fundamentele woorden!ijstgeheu-35 gen aanspreekt met behulp van de ingevoerde karakteramy en indien daardoor een gedeelte van de genoemde karakterarray wordt gevonden, dan wordt het fundamentele woordenlijstgeheugen op dezelfde wijze aangesproken met andere delen van de genoemde karakterarray teneinde daardoor de gehele karakterarray te analyseren.The second object of the invention can be met by a language analyzer comprising: input means for inputting a character array in a predetermined language, a basic glossary memory used to search for the input character array and containing basic data stored, and a parser for analyzing the input character array by searching the basic glossary wherein said parser addresses the basic word list memories using the input character amy and if thereby finds a portion of said character array, then the basic glossary memory is addressed in the same manner with other parts of said character array to thereby analyze the entire character array.

40 Aan het derde doel van de onderhavige uitvinding kan worden vol- 8702359 » f f 6 daan door een taal analyse Inrichting omvattende: een woordenlijstgeheugen waarin woordenlijstgegevens zijn opgeslagen voor elke woordenlijstverwijzingseenheid, en een ontleedgeheugen voor het verdelen van een ingevoerde zin in 5 woorden!ijstverwijzingseenheden en voor het uitvoeren van een morfologische analyse van de genoemde woorden!ijstverwijzingseenheden waarbij wordt verwezen naar het woordenlijstgeheugen, waarbij het genoemde woordenlijstgeheugen bij de woordenlijstgegevens voorzien is van onderscheidende indicaties die aangeven of de woorden-10 1ijstverwijzingseenheden getallen vertegenwoordigen en de genoemde ontleedinrichting verwijst naar het woordenlijstgeheugen met behulp van de respectievelijke woorden!ijstverwijzingseenheden die in de ingevoerde zin aanwezig zijn en indien de genoemde onderscheidende indicatie aanwezig is in het gevonden woordenlijstgegegeven, 15 dan wordt de woordenlijstverwijzingseenheid waarmee de genoemde onderscheidende indicatie is gevonden gecombineerd met een woordenlijstverwijzingseenheid die aanwezig is nabij de genoemde woordenlijstverwijzingseenheid en waarmee een andere onderscheidende indicatie wordt gevonden, de numerieke waarden die worden vertegenwoordigd door de beide 20 woorden!ijstverwijzingseenheden worden tezamen berekend tot een enkele numerieke waarde en de woorden!ijstverwijzingseenheden worden omgevormd tot een enkele ontleedeenheid.The third object of the present invention can be accomplished by a language analysis. Device comprising: a glossary memory in which glossary data is stored for each glossary reference unit, and a parsing memory for dividing an input phrase into 5 words. and for performing a morphological analysis of said words list reference units with reference to the glossary memory, said glossary memory at the glossary data having distinguishing indicia indicating whether the words list reference units represent numbers and said parser referring to the word list memory using the respective words list reference units present in the entered sentence and if said distinctive indication is present in the word list found, then the word list is reference unit with which said distinguishing indication is found combined with a glossary reference unit which is present near said vocabulary reference unit and with which another distinguishing indication is found, the numerical values represented by the two 20 word list units are calculated together into a single numerical value and the words ice reference units are transformed into a single parsing unit.

Aan het vierde doel van de onderhavige uitvinding kan worden voldaan door een taal analyse inrichting omvattende: 25 invoermiddelen voor het invoeren van een karakterarray in een vooraf bepaalde taal, een woordenlijstgeheugen dat wordt gebruikt voor het opzoeken van de genoemde ingevoerde karakterarray, zoekmiddelen voor het zoeken in het genoemde woordenlijstgeheugen 30 met behulp van de ingevoerde karakterarray, en type-informatie verschaffende middelen voor het verschaffen van type-informatie bij een niet in het woordenlijstgeheugen geregistreerde karakterarray en bij een karakterarray waarvan de type-infomratie niet is geregistreerd in het genoemde woordenlijstgeheugen, maar wel tot de 35 ingevoerde karakterarray's behoort, waarbij de type-informatie verschaffende middelen een aantal type-informaties verschaffen bij de genoemde karakterarray die geen type-informatie bezit.The fourth object of the present invention can be met by a language analyzer comprising: 25 input means for inputting a character array in a predetermined language, a glossary memory used for looking up said input character array, search means for searching in said dictionary memory 30 using the input character array, and type information providing means for providing type information in a character array not registered in the dictionary memory and in a character array whose type information is not recorded in said dictionary memory, but does belong to the 35 character arrays entered, the type information providing means providing a number of type information to said character array which does not have type information.

Aan het vijfde doel van de onderhavige uitvinding kan worden voldaan door een taal analyse inrichting omvattende: 40 een woordenlijstgeheugen waarin woordenlijstgegevens zijn opgesla- 87 0 2 35 9 7 f I 'ï gen voor alle woordenlijstverwijzingseenheden, en ontleedmiddelen voor het verdelen van een ingevoerde zin in woorden! ijstverwijzingseenheden en voor het uitvoeren van een morfologische analyse door te zoeken in het woorden!ijstgeheugen met behulp van de 5 woorden!ijstverwijzingseenheden, waarbij de woordenlijstgegevens in het woorden!ijstgeheugen voorzien zijn van onderscheidende informatie waarmee de positie wordt gespecificeerd van een woorden!ijstverwijzingseenheid die staat voor een eigennaam indien een aantal eigennamen kan optreden in een opeenvolgende reeks 10 van eigennamen, en de ontleedmiddelen zoeken in het woordenlijstgeheugen met de respectievelijke woorden!ijstverwijzingseenheden, aanwezig in de ingevoerde zin en, indien de onderscheidende informatie aanwezig is in de gevonden woordenlijstgegevens, de woorden!ijstverwijzingseenheid waarmet 15 de onderscheidende informatie is gevonden, wordt gecombineerd met de woorden!ijstverwijzingseenheid die direct grenst aan de genoemde woorden! ijstverwijzingseenheid en die een andere dan eigennaamsbetekenis heeft in een enkele ontleedeenheid in overeenstemming met de positie, gespecificeerd door de onderscheidende informatie.The fifth object of the present invention can be fulfilled by a language analyzer comprising: 40 a glossary memory in which glossary data is stored for all glossary reference units, and parsing means for dividing an entered phrase in words! ice reference units and to perform a morphological analysis by searching the word ice memory using the 5 word ice reference units, the word data in the ice memory having distinctive information specifying the position of a word ice unit stands for a proper name if a number of proper names can occur in a consecutive series of proper names, and searches the parsers in the dictionary memory with the respective words list reference units, present in the entered sentence and, if the distinguishing information is present in the found dictionary data, the words ice reference unit with which the distinguishing information has been found is combined with the words ice reference unit directly adjacent to said words! ice reference unit and which has a meaning other than proper noun in a single parsing unit according to the position specified by the distinguishing information.

20 Aan het zesde doel van de uitvinding kan worden voldaan door een taal analyse inrichting omvattende: invoermiddelen voor het invoeren van een karakterarray in een vooraf bepaalde taal, een woordenlijstgeheugen die wordt gebruikt voor het opzoeken van 25 de genoemde karakterarray, ingevoerd via de invoermiddelen, en type-informatie ontleedmiddelen die zoeken in het woordenlijstgeheugen met behulp van de ingevoerde karakterarray en de type-informatie van de genoemde karakterarray ontleden, waarbij de type-informatie ontleedmiddelen de type-informatie van de ka-30 rakterarray ontleden rekening houdende met de type-informatie van de karakterarray's voor en na de genoemde karakterarray. De type-informa-tie-ontleedmiddelen zijn bestemd voor het analyseren van de type-informatie van een aantal rijen met karakters door deze collectief te rangschikken in karakterarray's.The sixth object of the invention can be met by a language analyzer comprising: input means for inputting a character array in a predetermined language, a glossary memory used for looking up said character array input via the input means, and type information parsing means that search the dictionary memory using the input character array and parse the type information of said character array, the type information parsing means parsing the type information of the character array taking into account the type information of the character arrays before and after the said character array. The type information parsing means are for analyzing the type information of a number of rows of characters by collectively arranging them in character arrays.

35 Aan het zevende doel van de onderhavige uitvinding wordt voldaan door een taal analyse inrichting omvattende: een woordenlijstgeheugen waarin woordenlijstgegevens voor alle woordenlijstverwijzingseenheden zijn opgeborgen, en ontleedmiddelen voor het verdelen van een ingevoerde zin in woor-40 denlijstverwijzingseenheden en voor het uitvoeren van een morfologischeThe seventh object of the present invention is met by a language analyzer comprising: a glossary memory in which glossary data is stored for all glossary reference units, and parsing means for dividing an input phrase into glossary reference units and performing a morphological

870235S870235S

* i * 8 analyse voor de genoemde woorden!ijstverwijzingseenheden door te zoeken in het genoemde woorden!ijstgeheugen waarbij de genoemde ontleedmiddelen onderscheiden dat een reeks van woorden! ijstverwi jzingseenheden die een specifiek semantisch element heb-5 ben, een samengestelde eenheid vormt die een specifieke betekenis heeft behorend bij een bepaalde regel en een opeenvoging van woorden!ijstverwi jzingseenheden met het genoemde specifieke semantische element omvormt tot een enkele ontleedeenheid.* 8 analysis for said words ice reference units by searching said words ice memory wherein said parsing means distinguish a series of words! Ice reference units that have a specific semantic element, form a composite unit that has a specific meaning associated with a particular rule, and transforms a sequence of words with ice reference units with said specific semantic element into a single parsing unit.

Aan het achtste doel van de onderhavige uitvinding kan worden vol-10 daan door een taal analyse inrichting waarin de grammaticale aard, de semantische aard of het vertaalde woord wordt geschat als zijnde een niet in de woordenlijst geregistreerd woord, dat in verband met de mor-feemeigenschappen wordt herkend als een afgeleide zoals een achtervoegsel· 15 Aan het negende doel van de onderhavige uitvinding kan worden voldaan door een taal analyse inrichting, omvattende: een eerste ontleedmiddel voor het uitvoeren van een morfologische analyse op een ingevoerde zin in een vooraf bepaalde taal, een tweede ontleedmiddel voor het uitvoeren van een analyse van de 20 zin in de genoemde taal gebaseerd op het resultaat van de morfologische analyse in het eerstgenoemde ontleedmiddel, woorden!ijstgeheugen waarin woorden!ijstgegevens van de genoemde taal zijn opgeslagen die worden gebruikt voor de analyse door de genoemde eerste en tweede ontleedmiddelen, en 25 stuurmiddelen voor het zoeken in het genoemde woorden!ijstgeheugen en voor het doen uitvoeren van de ontleedprocedure in de eerste en tweede ontleedmiddelen, waarbij het genoemde eerste ontleedmiddel zoekt in het woorden!ijstgeheugen, de structurele rangschikking onderscheidt door de kenmerken van de 30 ingevoerde zin in de genoemde taal te onderscheiden ten aanzien van de vorm ervan en een inschatting maakt van de houding die het resultaat kan zijn van de ontleding en van de structurele rol van de genoemde rangschikking die in de genoemde zin functioneert, en het genoemde tweede ontleedmiddel de oppervlaktelaagstructuur van 35 de zin in de genoemde taal analyseert door het gebruiken van een grammaticale regel die gebaseerd is op de geschatte houding en rol, en de mogelijke ondergeschikte relatie van de samenstellende delen van de genoemde zin analyseert.The eighth object of the present invention can be fulfilled by a language analyzer in which the grammatical nature, the semantic nature or the translated word is estimated as being a word not included in the glossary, which is related to the morphology. fairy properties are recognized as a derivative such as a suffix · The ninth object of the present invention can be met by a language analyzer, comprising: a first parser for performing a morphological analysis on an input phrase in a predetermined language, a second parser for performing a sentence analysis in said language based on the result of the morphological analysis in the first parser, words memory storing words data of said language used for the analysis by said first and second parsing means, and control means for searching the said words memory and for performing the parsing procedure in the first and second parsers, said first parser searching in the words memory, distinguishing the structural arrangement by distinguishing the features of the entered sentence in said language in regard to its shape and estimates the attitude which may result from the decomposition and the structural role of said arrangement functioning in said sense, and said second decomposition agent the surface layer structure of the phrase in said sense analyzes language using a grammatical rule based on the estimated attitude and role, and analyzes the possible subordinate relationship of the constituent parts of the said sentence.

Met behulp van de taal analyse inrichting volgens de onderhavige 40 uitvinding kan een automatische vertaling worden vervaardigd met hoge 8702359 9 , f, snelheid en met hoge kwaliteit.Using the language analysis device according to the present invention, an automatic translation can be produced with high 8702359, f, speed and with high quality.

Een meer gedetailleerde verklaring van de taal analyse inrichting volgens de onderhavige uitvinding zal nu worden gegeven met verwijzing naar negen uitvoeringsvoorbeelden die in de figuren zijn geïllustreerd.A more detailed explanation of the language analyzer according to the present invention will now be given with reference to nine embodiments illustrated in the figures.

5 De onderhavige uitvinding is echter niet tot deze uitvoeringsvormen beperkt.However, the present invention is not limited to these embodiments.

De figuren 1 tot en met 10 illustreren een eerste uitvoeringsvorm van de taal analyse inrichting volgens de onderhavige uitvinding, toegepast voor het automatisch vertalen van Engels naar Japans.Figures 1 to 10 illustrate a first embodiment of the language analyzer according to the present invention used for automatic translation from English to Japanese.

10 Figuur 1 is een functioneel blokschema waarin een uitvoeringsvoor beeld is geïllustreerd van een gedetailleerde structuur voor een morfologische analysesectie.Figure 1 is a functional block diagram illustrating an exemplary embodiment of a detailed structure for a morphological analysis section.

Figuur 2 is een functioneel blokschema waarin de gehele structuur is geïllustreerd.Figure 2 is a functional block diagram illustrating the entire structure.

15 Figuur 3 toont een verklarend aanzicht ter illustratie van een voorbeeld van de structuur van een woorden!ijstbestand voorzien van een hoogste voorkeursvlag.Figure 3 shows an explanatory view illustrating an example of the structure of a word list file provided with a highest preferred flag.

Figuur 4 is een stroomschema ter illustratie van een voorbeeld van een morfologische analyseproces.Figure 4 is a flow chart illustrating an example of a morphological analysis process.

20 Figuur 5 is een stroomschema ter illustratie van een voorbeeld van de invoerverwerking in het morfologische analyseproces.Figure 5 is a flow chart illustrating an example of the input processing in the morphological analysis process.

Figuur 6 is een verklarend aanzicht ter illustratie van een voorbeeld voor het omvormen van een ingevoerde karakterarray.Figure 6 is an explanatory view illustrating an example for transforming an input character array.

Figuur 7 is een verklarend aanzicht ter illustratie van een voor-25 beeld van een wordenlijstzoekproces.Figure 7 is an explanatory view illustrating an example of a word list search process.

De figuren 8A tot en met 8D tonen stroomschema's ter illustratie van een voorbeeld van de tegenspraakeliminatie voor de hoogste voorkeursvlag tijdens de morfologische analyse.Figures 8A through 8D show flow charts illustrating an example of the contradiction elimination for the highest preferred flag during the morphological analysis.

Figuur 9 is een verklarend aanzicht ter illustratie van een voor-30 beeld van de inhoud van een teruggewonnen woordenlijstinformatiebuffer na het zoekproces in de woordenlijst.Figure 9 is an explanatory view illustrating an example of the contents of a recovered glossary information buffer after the glossary search process.

Figuur 10 is een verklarend aanzicht ter illustratie van een voorbeeld van de inhoud van de teruggewonnen woordenlijstinformatiebuffer als resultaat van het uitvoeren van een tegenspraakeliminatie voor de 35 hoogste voorkeursvlag.Figure 10 is an explanatory view illustrating an example of the contents of the recovered glossary information buffer as a result of performing a contradiction elimination for the highest preferred flag.

De figuren 11 tot en met 16 illustreren de tweede uitvoeringsvorm van de onderhavige uitvinding.Figures 11 to 16 illustrate the second embodiment of the present invention.

Figuur 11 is een blokschema van de uitvoeringsvorm.Figure 11 is a block diagram of the embodiment.

Figuur 12 toont in een aantal diagrammen een voorbeeld van de ge-40 gevensopslag in de woordenlijst.Figure 12 shows in a number of diagrams an example of the data storage in the glossary.

8702358 10 Λ ,8702358 10 Λ,

Figuur 13 toont een diagram ter illustratie van een voorbeeld van de gegevensopslag in een fundamentele eenheidwoordenlijst.Figure 13 shows a diagram illustrating an example of data storage in a basic unit dictionary.

Figuur 14 toont een diagram ter illustratie van een voorbeeld van de gegevensopslag in een woordenlijstinformatiebehoudtabel.Figure 14 shows a diagram illustrating an example of the data storage in a glossary information retention table.

5 Figuur 15 toont een stroomschema ter illustratie van de werking van de inrichting.Figure 15 shows a flow chart illustrating the operation of the device.

Figuur 16 toont een stroomschema ter illustratie van de eenheid-herkenning.Figure 16 shows a flow chart illustrating the unit recognition.

De figuren 17 tot en met 29 illustreren de derde uitvoeringsvorm 10 van de taal analyse inrichting volgens de onderhavige uitvinding toegepast voor automatische vertalingen van Engels naar Japans.Figures 17 to 29 illustrate the third embodiment 10 of the language analyzer according to the present invention used for automatic translations from English to Japanese.

Figuur 17 is een functioneel blokschema ter illustratie van een voorbeeld van de gedetailleerde structuur voor het uitvoeren van de morfologische analyse.Figure 17 is a functional block diagram illustrating an example of the detailed structure for performing the morphological analysis.

15 Figuur 18 is een functioneel blokschema ter illustratie van de gehele structuur.Figure 18 is a functional block diagram illustrating the entire structure.

De figuren 19A en 19B zijn stroomschema's ter illustratie van een voorbeeld van de morfologische analyse.Figures 19A and 19B are flow charts illustrating an example of the morphological analysis.

Figuur 20 is een stroomschema ter illustratie van een voorbeeld 20 van de collectieve rangschikking van een geldswaardesymbool en een eenheid in de morfologische analyse.Figure 20 is a flow chart illustrating an example 20 of the collective arrangement of a money value symbol and a unit in the morphological analysis.

De figuren 21A en 21B zijn stroomschema's ter illustratie van een voorbeeld van via een koppelteken verbonden getallen in de morfologische analyse.Figures 21A and 21B are flow charts illustrating an example of hyphenated numbers in the morphological analysis.

25 De figuren 22A en 22B zijn stroomschema's ter illustratie van een voorbeeld van de bewerking van opeenvolgende getallen tijdens de morfologische analyse.Figures 22A and 22B are flow charts illustrating an example of the processing of consecutive numbers during the morphological analysis.

De figuren 23A en 23B zijn stroomschema's ter illustratie van een voorbeeld van de collectieve samenvoeging met een voorafgaande numerie-30 ke waarde tijdens de morfologische analyse.Figures 23A and 23B are flow charts illustrating an example of the collective aggregation with a previous numerical value during the morphological analysis.

Figuur 24 toont een verklarend aanzicht ter illustratie van een voorbeeld van de structuur van een woordenlijstbestand voorzien van een numerieke vlag.Figure 24 shows an explanatory view illustrating an example of the structure of a glossary file provided with a numeric flag.

Figuur 25 toont een verklarend aanzicht ter illustratie van een 35 voorbeeld van een ingangskarakterarray.Figure 25 shows an explanatory view illustrating an example of an input character array.

De figuren 26A tot en met 26D zijn verklarende aanzichten ter illustratie van de inhoud van de woordenlijstinformatiebehoudtabel, verkregen uit de woordenlijst met behulp van de ingangskarakterarray die geïllustreerd is in figuur 25 in opeenvolgende verwerkingsstappen.Figures 26A through 26D are explanatory views illustrating the contents of the glossary information retention table obtained from the glossary using the input character array illustrated in Figure 25 in subsequent processing steps.

40 Figuur 27 is een verklarend aanzicht ter illustratie van een ander 8702350 11 4 -40 Figure 27 is an explanatory view illustrating another 8702350 11 4 -

f 'Xf 'X

uitvoeringsvoorbeeld van een ingangskarakterarray.exemplary embodiment of an input character array.

Figuur 28 is een verklarend aanzicht ter illustratie van de inhoud van een geldswaardesymbooltabel, positienotatietabel, en decimale punt-tabel in de woordenlijst.Figure 28 is an explanatory view illustrating the contents of a money value symbol table, position notation table, and decimal point table in the glossary.

5 De figuren 29A tot en met 29D zijn verklarende aanzichten ter illustratie van een voorbeeld van de woordenlijstinformatiebehoudtabel verkregen uit de woordenlijst met behulp van de ingangskarakterarray die getoond is in figuur 27 in diverse verwerkingsstappen.Figures 29A through 29D are explanatory views illustrating an example of the glossary information retention table obtained from the glossary using the input character array shown in Figure 27 in various processing steps.

De figuren 30 tot en met 36 illustreren de vierde uitvoeringsvorm 10 van de onderhavige uitvinding.Figures 30 to 36 illustrate the fourth embodiment 10 of the present invention.

Figuur 30 is een blokschema van de uitvoeringsvorm.Figure 30 is a block diagram of the embodiment.

Figuur 31 is een diagram ter illustratie van een voorbeeld van de gegevens, opgeslagen in een andere woordenlijst.Figure 31 is a diagram illustrating an example of the data stored in another glossary.

Figuur 32 is een stroomschema ter illustratie van de werking van 15 de gehele inrichting.Figure 32 is a flow chart illustrating the operation of the entire device.

Figuur 33 is een stroomschema ter illustratie van de verwerking van een eigennaam, geregistreerd in de woordenlijst.Figure 33 is a flow chart illustrating the processing of a proper name registered in the glossary.

Figuur 34 is een stroomschema ter illustratie van een verwerking van een niet in de woordenlijst geregistreerde eigennaam.Figure 34 is a flow chart illustrating processing of a proper noun not registered in the glossary.

20 Figuur 35 is een stroomschema ter illustratie van een verwerking die een standaard type-informatie oplevert.Figure 35 is a flow chart illustrating a processing that yields standard type information.

Figuur 36 is een diagram ter illustratie van een voorbeeld waarin gegevens, opgeslagen in de woordenlijst!nformatiebehoudtabel, na de verwerking van de ingangszin worden gevarieerd.Fig. 36 is a diagram illustrating an example in which data stored in the glossary information retention table is varied after the input sentence has been processed.

25 De figuren 37 tot en met 46 illustreren de vijfde uitvoeringsvorm van de taal analyse inrichting volgens de onderhavige uitvinding toegepast voor automatische vertalingen van Engels naar Japans.Figures 37 to 46 illustrate the fifth embodiment of the language analyzer according to the present invention used for automatic translations from English to Japanese.

Figuur 37 is een functioneel blokschema ter illustratie van een gedetailleerde structuur voor de morfologische analysesectie.Figure 37 is a functional block diagram illustrating a detailed structure for the morphological analysis section.

30 Figuur 38 is een functioneel blokschema ter illustratie van de ge hele structuur.Figure 38 is a functional block diagram illustrating the entire structure.

Figuur 39 is een verklarend aanzicht ter illustratie van een uitvoeringsvorm van de structuur van het woordenlijstbestand.Figure 39 is an explanatory view illustrating an embodiment of the glossary file structure.

Figuur 40 is een stroomschema ter illustratie van een voorbeeld 35 van de morfologische ontleding van een eigennaam.Figure 40 is a flow chart illustrating an example 35 of the morphological decomposition of a proper name.

Figuur 41 is een stroomschema ter illustratie van een voorbeeld van de collectieve samenvoeging van eigennamen, geregistreerd in de woordenlijst, tijdens de morfologische analyse.Figure 41 is a flow chart illustrating an example of the collective aggregation of proper names, registered in the glossary, during the morphological analysis.

De figuren 42, 43 en 44 zijn stroomschema's ter illustratie van 40 een voorbeeld van een verwerking die afhankelijk is van positie-infor- 8702359 # * 12 matie van de eigennaam tijdens de analyse van een eigennaam.Figures 42, 43 and 44 are flowcharts illustrating 40 an example of a processing which depends on position information of the proper name during the analysis of a proper name.

Figuur 45 is een stroomschema ter illustratie van een voorbeeld van een collectieve samenvoeging van eigennamen die niet in de woordenlijst zijn geregistreerd tijdens de morfologische analyse.Figure 45 is a flow chart illustrating an example of a collective aggregation of proper nouns not registered in the glossary during the morphological analysis.

5 De figuren 46A tot en met 46F zijn verklarende diagrammen ter illustratie van de inhoud van de woorden!ijstinformatiebehoudtabel waarnaar door de woordenlijst wordt verwezen voor het voorbeeld van een in-gangskarakterarray en wel bij diverse verwerkingsstappen.Figures 46A through 46F are explanatory diagrams illustrating the contents of the glossary information retention table referenced by the glossary for the example of an input character array in various processing steps.

De figuren 47 tot en met 52 illustreren de zesde uitvoeringsvorm 10 van de onderhavige uitvinding.Figures 47 to 52 illustrate the sixth embodiment 10 of the present invention.

Figuur 47 is een blokschema van een uitvoeringsvorm.Figure 47 is a block diagram of an embodiment.

Figuur 48 is een diagram ter illustratie van een voorbeeld van de gegevens, opgeslagen in de referentiewoordenlijst.Figure 48 is a diagram illustrating an example of the data stored in the reference dictionary.

Figuur 49 is een stroomschema ter illustratie van de werking van 15 de gehele inrichting.Figure 49 is a flow chart illustrating the operation of the entire device.

Figuur 50 is een stroomschema ter illustratie van de verwerking van in de woordenlijst geregistreerde eigennamen.Figure 50 is a flow chart illustrating the processing of glossary-registered proper names.

Figuur 51 is een stroomschema ter illustratie van de verwerking van niet in de woordenlijst geregistreerde eigennamen.Figure 51 is a flow chart illustrating the processing of proper names not registered in the glossary.

20 Figuur 52 is een diagram ter illustratie van een voorbeeld waarin gegevens, opgeslagen in de woordenlijstinformatiebehoudtabel na verwerking van de ingangszin worden gevarieerd.Figure 52 is a diagram illustrating an example in which data stored in the glossary information retention table after input sentence processing is varied.

De figuren 53 tot en met 57 illustreren de zevende uitvoeringsvorm van de taal analyse inrichting volgens de onderhavige uitvinding toege-25 past voor automatische vertalingen van Engels naar Japans.Figures 53 to 57 illustrate the seventh embodiment of the language analyzer according to the present invention used for automatic translations from English to Japanese.

Figuur 53 is een functioneel blokschema ter illustratie van een uitvoeringsvorm van de gedetailleerde structuur van de morfologische analysesectie.Figure 53 is a functional block diagram illustrating an embodiment of the detailed structure of the morphological analysis section.

Figuur 54 is een functioneel blokschema waarin de gehele structuur 30 wordt geïllustreerd.Figure 54 is a functional block diagram illustrating the entire structure 30.

De figuren 55A en 55B zijn stroomschema's ter illustratie van een voorbeeld van de morfologische analyse.Figures 55A and 55B are flow charts illustrating an example of the morphological analysis.

Figuur 56 is een verklarend aanzicht ter illustratie van een voorbeeld van de inhoud van de informatietabel in de sectie 108 die de mor-35 feemverwerkingsinformatie verschaft.Fig. 56 is an explanatory view illustrating an example of the contents of the information table in section 108 providing the morph processing information.

Figuur 57 is een verklarend aanzicht ter illustratie van een voorbeeld van de inhoud van een overeenstemmingstabel 128.Fig. 57 is an explanatory view illustrating an example of the contents of a match table 128.

De figuren 58 tot en met 63 illustreren de achtste uitvoeringsvorm van de onderhavige uitvinding.Figures 58 to 63 illustrate the eighth embodiment of the present invention.

40 Figuur 58 is een blokschema ter verklaring van de gehele struc- 8702 359 i Ί 13 tuur.40 Figure 58 is a block diagram for explaining the entire structure.

Figuur 59 is een blokschema ter verklaring van een voorbeeld van de verwerking van afgeleide woorden door middel van een voorvoegsel.Fig. 59 is a block diagram for explaining an example of derivative word processing using a prefix.

Figuur 60 is een diagram ter illustratie van een voorbeeld van de 5 verwerking van afgeleide woorden door middel van een achtervoegsel.Figure 60 is a diagram illustrating an example of the derivative word processing by means of a suffix.

Figuur 61 is een blokschema van enkele details verkregen door het synthetiseren van de figuren 58, 59 en 60.Figure 61 is a block diagram of some details obtained by synthesizing Figures 58, 59 and 60.

Figuur 62 is een blokschema dat verdere details vertoont voor een gehele verwerkingssectie voor niet geregistreerde woorden in figuur 10 61.Figure 62 is a block diagram showing further details for an entire unwritten word processing section in Figure 1061.

Figuur 63 is een blokschema ter verklaring van een uitvoeringsvorm van een automatische vertaal inrichting waarin de onderhavige uitvinding wordt toegepast.Figure 63 is a block diagram for explaining an embodiment of an automatic translation device in which the present invention is applied.

De figuren 64 tot en met 90 illustreren de negende uitvoeringsvorm 15 van de taal analyse inrichting volgens de onderhavige uitvinding toegepast voor het vertalen van Engels naar Japans.Figures 64 through 90 illustrate the ninth embodiment 15 of the language analyzer according to the present invention used for translating from English to Japanese.

Figuur 64 is een functioneel blokschema ter illustratie van de gehele structuur.Figure 64 is a functional block diagram illustrating the entire structure.

Figuur 65 is een functioneel blokschema dat collectief de herken-20 ningsfunctie van de structurele rangschikking van een Engelse ingangs-zin als blok illustreert.Figure 65 is a functional block diagram collectively illustrating the recognition function of the structural arrangement of an English input sentence as a block.

Figuur 66 is een stroomschema ter illustratie van een voorbeeld van de collectieve rangschikking van een blok in de Engelse ingangs-zin.Figure 66 is a flow chart illustrating an example of the collective arrangement of a block in the English input sense.

25 Figuur 67 is een stroomschema waarin details zijn geïllustreerd van het woordopzoekproces in de verwerkingsstroom.Figure 67 is a flow chart illustrating details of the word lookup process in the processing flow.

Figuur 68 is een verklarend aanzicht ter illustratie van een voorbeeld van de woordenlijst!nformatie van Engelse woorden of zinnen, opgeslagen in de woordenlijst.Figure 68 is an explanatory view illustrating an example of the glossary of English words or phrases stored in the glossary.

30 Figuur 69 is een verklarend aanzicht ter illustratie van een voor beeld van de tabel gegevens voor de blokbegintoestand, eindtoestand en de schattingstoestanden voor doel en rol opgeslagen in het analysere-gelbestand.Fig. 69 is an explanatory view illustrating an example of the table data for the block initial state, final state, and target and role estimation states stored in the analyzer rule file.

Figuur 70 is een verklarend aanzicht ter illustratie van een voor-35 beeld van de collectieve rangschikking van een structuur.Figure 70 is an explanatory view illustrating an example of the collective arrangement of a structure.

Figuur 71 is een verklarend aanzicht ter illustratie van een voorbeeld van een collectieve rangschikking voor een blok.Figure 71 is an explanatory view illustrating an example of a collective arrangement for a block.

Figuur 72 is een verklarend aanzicht ter illustratie van Engelse informatie en woordinformatie die tezamen in een blok zijn gerang-40 schikt.Figure 72 is an explanatory view illustrating English information and word information arranged together in a block.

8702359 f t 148702359 f t 14

Figuur 73 is een stroomschema ter illustratie van een voorbeeld van de analysebewerking uitgevoerd in een analysesectie.Figure 73 is a flow chart illustrating an example of the analysis operation performed in an analysis section.

Figuur 74 is een verklarend aanzicht soortgelijk aan dat van figuur 68 ter illustratie van een voorbeeld van een ingangsgegeven in een 5 woord-zin-woordenlijst in het geval dat deze uitvoeringsvoorbeeld voorzien is van de functie voor het schatten van identieke gevallen.Fig. 74 is an explanatory view similar to that of Fig. 68 illustrating an example of an entry in a 5 word phrase glossary in case this exemplary embodiment is provided with the function of estimating identical cases.

Figuur 75 is een verklarend aanzicht soortgelijk aan dat van figuur 69 ter illustratie van een voorbeeld voor de begin- en eindtoestand van een blok en een tabel van de blokvoorbereidingsinformatie in 10 het geval deze uitvoeringsvorm voorzien is van een functie voor het schatten van identieke gevallen.Fig. 75 is an explanatory view similar to that of Fig. 69 illustrating an example for the starting and ending state of a block and a table of the block preparation information in case this embodiment includes a function of estimating identical cases.

Figuur 76 is een functioneel blokschema soortgelijk aan dat van figuur 64 ter illustratie van de gehele structuur van een modificatie van deze uitvoeringsvorm.Figure 76 is a functional block diagram similar to that of Figure 64 illustrating the entire structure of a modification of this embodiment.

15 Figuur 77 is een functioneel blokschema soortgelijk aan dat van figuur 76 voor het bij elkaar brengen van de ontleedfuncties voor let-informatie in de gemodificeerde uitvoeringsvorm van figuur 76.Figure 77 is a functional block diagram similar to that of Figure 76 for aggregating the let information parsing functions in the modified embodiment of Figure 76.

Figuur 78 is een verklarend aanzicht ter illustratie van een uitvoeringsvoorbeeld van de woordenlijstinformatie die let-informatie in 20 Engelse woorden en zinnen bevat, opgeslagen in de woordenlijst van de gemodificeerde uitvoeringsvorm.Fig. 78 is an explanatory view illustrating an exemplary embodiment of the glossary information containing let information in English words and phrases stored in the glossary of the modified embodiment.

De figuren 79 en 80 zijn verklarende aanzichten soortgelijk aan die van figuur 72 ter illustratie van een voorbeeld van de blokinforma-tie en de woordinformatie waarin Engelse zinnen met let-informatie col-25 lectief in een blok zijn gecombineerd.Figures 79 and 80 are explanatory views similar to that of Figure 72 illustrating an example of the block information and the word information in which English sentences with let information are collectively combined in a block.

Figuur 81 is een stroomschema soortgelijk aan dat van figuur 73 ter illustratie van een voorbeeld van de reeks van stappen voor het verkrijgen van een collectieve samenstelling van de let-informatie uit de Engelse ingangszin.Fig. 81 is a flow chart similar to that of Fig. 73 illustrating an example of the series of steps for obtaining a collective compilation of the let's input from the English input.

30 Figuur 82 is een stroomschema soortgelijk aan dat van figuur 73 ter illustratie van een voorbeeld van de anallyseverwerking indien er let-informatie aanwezig is, uitgevoerd in de analysesectie in de gemodificeerde uitvoeringsvorm.Figure 82 is a flow chart similar to that of Figure 73 illustrating an example of the analysis processing if let information is present, performed in the analysis section in the modified embodiment.

De figuren 83A en 83B zijn stroomschema's ter illustratie van een 35 voorbeeld van de analysestroming bij let-informatie in de ingevoerde Engelse zin.Figures 83A and 83B are flow charts illustrating an example of the analysis flow of let information in the entered English sense.

Figuur 84 is een stroomschema ter illustratie van een voorbeeld van de analysestroming voor koppel tekenwoorden in de Engelse zin.Figure 84 is a flow chart illustrating an example of the flow of analysis for torque keywords in the English sense.

Figuur 85 is een verklarend aanzicht soortgelijk aan dat van fi-40 guur 72 ter illsurtatie van een voorbeeld van de blokinformatie en de 8702359 / ï 15 woordinformatie die tezamen uit de Engelse ingangszin met via koppelteken gekoppelde woorden zijn gecombineerd tot een blok.Figure 85 is an explanatory view similar to that of Figure 72 for illustrating an example of the block information and the word information 8702359/15 combined together from the English input sentence with hyphenated words.

Figuur 86 is een functioneel blokschema, soortgelijk aan dat van figuur 64 ter illustratie van de gehele structuur van een andere gemo-5 dificeerde uitvoeringsvorm.Figure 86 is a functional block diagram similar to that of Figure 64 illustrating the entire structure of another modified embodiment.

Figuur 87 is een functioneel blokschema soortgelijk aan dat van figuur 65 waarin de analysefunctie van de additieve afvraging in de Engelse ingangszin collectief bijeen gebracht is in de gemodificeerde uitvoeringsvorm getoond in figuur 93.Figure 87 is a functional block diagram similar to that of Figure 65 in which the additive interrogation analysis function in the English input sentence is collectively assembled in the modified embodiment shown in Figure 93.

10 De figuren 88 en 89 zijn verklarende aanzichten, soortgelijk aan die van figuur 72 ter illustratie van een voorbeeld van blokinformatie en woordinformatie, samengebracht uit een Engelse zin die een additieve afvraging in een blok bevat.Figures 88 and 89 are explanatory views, similar to that of Figure 72, illustrating an example of block information and word information, brought together from an English sentence containing an additive interrogation in a block.

De figuren 90A en 90B zijn stroomschema's ter illustratie van een 15 voorbeeld van een analysestroom bij additieve afvraging in de Engelse ingangszin.Figures 90A and 90B are flow charts illustrating an example of an analysis flow in additive interrogation in the English input sense.

De eerste uitvoeringsvorm volgens de onderhavige uitvinding zal nu worden beschreven.The first embodiment of the present invention will now be described.

Figuur 2 illustreert de gehele samenstelling van de eerste uitvoe-20 ringsvorm, waarin de taalanalysator volgens de onderhavige uitvoeringsvorm wordt toegepast bij een automatische inrichting voor het vertalen van Engels naar Japans. De onderhavige uitvinding kan natuurlijk effectief niet alleen in een dergelijke automatische inrichting voor het vertalen van Engels naar Japans worden gebruikt maar kan ook worden ge-25 brui kt voor willekeurige andere taalanalysatoren waarin de zinnen van de ingevoerde taal in hoofdzaak worden geanalyseerd bij de vertaling van een bepaalde taal in een andere taal.Figure 2 illustrates the entire composition of the first embodiment, in which the language analyzer of the present embodiment is used in an automatic translation machine from English to Japanese. Of course, the present invention can effectively be used not only in such an automatic English to Japanese translation device, but can also be used for any other language analyzer in which the sentences of the input language are mainly analyzed when translating. a certain language in another language.

De in de tekening geïllustreerde uitvoeringsvorm heeft een invoer-sectie 1010, via welke een Engelse tekst 1012 die in het Japans moet 30 worden vertaald, wordt ingevoerd. De invoersectie 1010 bevat bijvoorbeeld een toetsenbord met karaktertoetsen zoals alfanumerieke toetsen of functietoetsen, een optische karakterleeseenheid (OCR-eenheid) voor het lezen van op papier vastgelegde Engelse teksten en/of een bestands-geheugeninrichting voor het lezen van een Engelse tekst die opgeslagen 35 is op een geheugenmedium zoals een magnetische schijf.The embodiment illustrated in the drawing has an input section 1010 through which an English text 1012 to be translated into Japanese is input. Input section 1010 includes, for example, a keyboard with character keys such as alphanumeric keys or function keys, an optical character reading unit (OCR unit) for reading English-written text and / or a file memory device for reading an English text stored 35 on a memory medium such as a magnetic disk.

De vanaf de invoersectie 1010 ingevoerde Engelse tekst wordt ingelezen in een voorredigeersectie 1014 waarin een voorbehandeling voor de vertaling wordt uitgevoerd. In deze sectie worden in hoofdzaak zinsher-kenning en onbekende woordverwerking uitgevoerd. Dit functioneert als 40 deel van een morfologische analyse.The English text entered from the input section 1010 is read into a pre-editing section 1014 in which translation pretreatment is performed. Mainly sentence recognition and unknown word processing are performed in this section. This functions as a part of a morphological analysis.

8702358 t y 168702358 t y 16

De voorgeredigeerde Engelse gegevens worden tezamen met de informatie verkregen tijdens de voorredigering overgedragen naar een morfologische analysesectie 1016. De sectie 1016 verdeelt de zin door verwijzing naar een woordenlijst 1018, analyseert de morfemen van de En-5 gelse zin, voert diverde typen klassifikaties uit zoals verwerking van onbekende woorden, eigennamen, tijdsuitdrukkingen, getalsuitdrukkingen en dergelijke en voert een verwerking uit voor de gehele zin zoals het zoeken naar vaste uitdrukkingen en het herkennen van identieke gevallen. De regels voor de morfologische analyse zijn opgeborgen in het 10 analyseregelbestand 1036.The pre-edited English data along with the information obtained during the pre-editing is transferred to a morphological analysis section 1016. Section 1016 divides the sentence by reference to a glossary 1018, analyzes the morphemes of the En-5 gelse sentence, performs various types of classifications such as processing unknown words, proper nouns, time expressions, number expressions and the like and performs whole sentence processing such as searching for fixed expressions and recognizing identical cases. The rules for the morphological analysis are stored in the 10 analysis rule file 1036.

De Engelse gegevens worden na de morfologische analyse tezamen met de woordenlijstinformatie, verkregen tijdens de morfologische analyse, overgedragen naar een analysesectie I 1020. De analysesectie I 1020 is een functionele sectie die een analyse uitvoert op de oppervlaktestruc-15 tuur van een Engelse zin door een grammatische regel toe te passen op de Engelse gegevens en alle structurele mogelijkheden op te sporen.The English data, after the morphological analysis, together with the glossary information obtained during the morphological analysis, is transferred to an analysis section I 1020. The analysis section I 1020 is a functional section that performs an analysis on the surface structure of an English sentence by means of a apply grammatical rule to the English data and detect all structural possibilities.

De Engelse gegevens na de analyse in de analysesectie I 1020 worden tezamen met de ontleedinformatie daarvan toegevoerd aan de analysesectie II 1022. In deze sectie wordt een oplossing geselecteerd door 20 een syntactische analyse uit te voeren op het resultaat van de eerste analyse I op de oppervlaktelaag. Op deze wijze wordt een plausibele syntax-analyse voor de Engelse zin vervaardigd teneinde de structuur ervan te vormen. Deze analyseregels zijn ook opgeslagen in het analyseregel bestand 1036.The English data after the analysis in the analysis section I 1020 together with the decomposition information thereof is fed to the analysis section II 1022. In this section, a solution is selected by performing a syntactic analysis on the result of the first analysis I on the surface layer . In this way, a plausible syntax analysis for the English sentence is prepared to form its structure. These analysis rules are also stored in the analysis rule file 1036.

25 De Engelse gegevens worden na de analyse overgedragen als gegevens voor de syntax-analyse naar een structuurtransformatiesectie 1024. De structuurtransformatiesectie 1024 vervaardigt een corresponderende structuur voor de Japanse zin uit deze syntax-analyse, d.w.z. een tussenstructuur van de Engelse zin en converteert deze naar een Japanse 30 onderliggende structuur van waaruit het Japans gemakkelijk kan worden vertaald.25 The English data is transferred after the analysis as data for the syntax analysis to a structure transformation section 1024. The structure transformation section 1024 creates a corresponding structure for the Japanese sentence from this syntax analysis, ie an intermediate structure of the English sentence and converts it to a Japanese 30 underlying structure from which Japanese can be easily translated.

De syntax-analysegegevens die de Japanse onderliggende structuur tonen en die op deze wijze zijn onderworpen aan een structuurtransfor-matie worden toegezonden aan een vertaling genererende sectie 1026 35 waarin de vertaalde zin wordt gevormd. Dit is een functionele sectie voor het genereren van een Japanse zin uit de Japanse boomstructuur.The syntax analysis data showing the Japanese underlying structure and thus subjected to a structure transformation is sent to a translation generating section 1026 in which the translated sentence is generated. This is a functional section for generating a Japanese sentence from the Japanese tree structure.

De op deze wijze vertaalde Japanse zingegevens, d.w.z. de gegevens van de vertaalde zin worden toegevoerd aan een naredigeersectie 1030.The Japanese zing data translated in this manner, i.e. the data of the translated sentence, is supplied to a post-digest section 1030.

De naredigeersectie 1030 modificeert de vertaalde gegevens van de zin 40 door te verwijzen naar de woordenlijst 1018 terwijl informatie wordt 8702359 <* '* 17 toegepast die werd gebruikt in de vertaal bewerking teneinde een meer natuurlijke Japanse zin te verkrijgen» De gegevens van de Japanse zin worden overgedragen naar een uitvoersectie 1032 en vervolgens als de vertaalde Japanse zin 1034 door de uitvoersectie 1032 afgegeven. De 5 uitvoersectie 1032 bevat bijvoorbeeld een afdrukeenheid, een beeldscherm en/of een bestandsgeheugeninrichting zoals een magnetisch schijfgeheugen.The post-editing section 1030 modifies the translated data of the sentence 40 by referring to the glossary 1018 while applying information 8702359 <* '* 17 used in the translation operation to obtain a more natural Japanese sentence »The Japanese sentence data are transferred to an output section 1032 and then output as the translated Japanese phrase 1034 by the output section 1032. The output section 1032 includes, for example, a printer, a display and / or a file memory device such as a magnetic disk memory.

De reeks van achtereenvolgende vertaal bewerkingen wordt bestuurd door een besturingssectie 1038 die de besturing van de gehele inrich-10 ting verzorgt. De woordenlijst 1018 bevat de woorden!ijstgegevens voor de woorden van de Engelse en de Japanse taal en in deze uitvoeringsvorm zijn niet alleen de vocabulaires, maar ook diverse informaties zoals onderlinge samenhang, dat wil zeggen woorden die samen optreden, betekenissen, meervoudige en enkelvoudige vormen, zinsdelen, enz. opgeno-15 men. In het analyseregelbestand 1036 zijn de gegevens opgeslagen van de regels voor de morfologische analyse en de syntactische analyse.The sequence of successive translation operations is controlled by a control section 1038 that controls the entire device. The glossary 1018 contains the glossary entries for the words of the English and Japanese languages, and in this embodiment are not only the vocabularies, but also various information such as interrelationship, that is, words that occur together, meanings, multiple and singular forms , phrases, etc., are included. The analysis rule file 1036 contains the data of the rules for morphological analysis and syntactic analysis.

De besturingssectie 1038 is gekoppeld met een operatie-weergeef-sectie 1040. De operatie-weergeefsectie 1040 omvat bedieningstoetsen waarmee een operateur aan de onderhavige inrichting diverse instructies 20 kan afgeven zoals vertaal instruct!etoetsen of cursortoetsen, een weer-geefeenheid of indicator waarmee de ingevoerde Engelse tekst, de uit de vertaling resulterende Japanse zin, tussenliggende gegevens zoals woor-denlijstinformatie, diverse instructies voor de operateur enz. zichtbaar gemaakt kunnen worden. De meeste van de operatieweergeeffuncties 25 kunnen zodanig uitgevoerd zijn dat ze ondergebracht kunnen worden in een toetsenbord indien dit aanwezig is in de ingangssectie 1010 of in een weergeefeenheid die aanwezig kan zijn in de uitgangssectie 1032.The control section 1038 is coupled to an operation display section 1040. The operation display section 1040 includes operation keys that allow an operator to issue various instructions 20 to the present device, such as translation instruction keys or cursor keys, a display unit or indicator with which the input English text, the Japanese sentence resulting from the translation, intermediate data such as glossary information, various instructions for the operator, etc. can be made visible. Most of the operation display functions 25 may be implemented to be housed in a keyboard if it is present in the input section 1010 or in a display unit that may be present in the output section 1032.

In figuur 1 is een gedetailleerde structuur van de morfologische analysesectie 1016 geïllustreerd. De analysesectie 1016 heeft een in-30 voerinrichting 1100 zoals een toetsenbord voor de invoersectie 1010 en een invoerkoppelschakeling 1104 die een koppeling vormt met het invoer-documentenbestand 1102. De invoerkoppelschakeling 1104 is voorzien van een invoerkarakterarraybuffer waarin de gegevens van de Engelse karak-terarray in de vorm van gecodeerde data worden ingevoerd, bijvoorbeeld 35 in de vorm van ASCII vanaf de invoereenheid 1100 of vanaf het invoerdo-cumentenbestand 1102 en waarin de karakterarraygegevens tijdelijk worden opgeslagen. De ingevoerde karakterarray kan de voorgeredigeerde array zijn uit de voorredigeersectie 1014.In Figure 1, a detailed structure of the morphological analysis section 1016 is illustrated. The analysis section 1016 has an input device 1100 such as a keyboard for the input section 1010 and an input interface circuit 1104 which interfaces with the input document file 1102. The input interface circuit 1104 includes an input character array buffer into which the data of the English character array is stored. the form of encoded data is input, for example 35 in the form of ASCII from the input unit 1100 or from the input document file 1102 and in which the character array data is temporarily stored. The input character array may be the pre-edited array from the pre-editing section 1014.

De morfologiscche analysesectie 1016 omvat, zoals is geïllustreerd 40 in de figuur, een verwerkingssectie 1106, een woorden!ijstopzoeksectie 8702359 T 3 18 1108, een tegenspraakeliminatieregel-verwerkingssectie 1110 en een besturf ngssect ie 1112. De verwerkingssectie 1106 is een analysefunctie-sectie voor het uitvoeren van een morfologische analyse en omvat een terwugwinwoordenlijst-informatiebuffer, d.w.z. een woordenlijstinforma-5 tiebehoudtabel 1120 (zie figuur 9). De morfologische analyse wordt uitgevoerd door de woordenlijstterugwinning ordelijk te besturen vanaf het begin van de invoerkarakterarray in overeenstemming met de terugwin-sleutelkarakterarray en de verkregen woordenlijstinformatie uit de woordenlijstopzoeksectie 1108 in overeenstemming daarmee op te bergen 10 in de woorden!ijst-informatiebehoudbuffer 1120 en door de verwerking van een voorkeursgraad in overeenstemming met de hoogste voorkeursvlag zoals later nog zal worden beschreven.The morphological analysis section 1016 includes, as illustrated 40 in the figure, a processing section 1106, a word search section 8702359 T 3 18 1108, a contradiction elimination rule processing section 1110, and a polling section 1112. The processing section 1106 is an analysis function section for performs a morphological analysis and includes a retrieval glossary information buffer, ie, a glossary information retention table 1120 (see Figure 9). The morphological analysis is performed by orderly controlling the glossary recovery from the start of the input character array in accordance with the retrieval key character array and storing the obtained glossary information from the glossary lookup section 1108 accordingly in the glossary information retention buffer 1120 and by processing a preferred degree in accordance with the highest preferred flag as will be described later.

De woordelijstopzoeksectie 1108 is een functionele sectie voor het opzoeken van woordenlijstinformatie door zoeken in de woordenlijst 1018 15 gebaseerd op de terugwinsleutelkarakterarray die afkomstig is van de verwerkingssectie 1106, waarna de gevonden woordenlijst naar de verwerkingssectie 1106 wordt overgedragen.The glossary lookup section 1108 is a functional section for looking up glossary information by searching the glossary 1018 based on the recovery key character array originating from the processing section 1106, after which the found glossary is transferred to the processing section 1106.

De woordenlijst 1018 bevat grammaticale informatie zoals zinsdeel-informatie en verbuigingen van het woord in elk ingangsgegeven, alsmede 20 een hoogste voorkeursvlag zoals getoond is in figuur 3 voor een voorbeeld van een ingangsgegeven. De woordenlijst wordt aangeduid als een woorden!ijstbestand met een hoogste voorkeursvlag. "De hoogste voor-keursvlag" is een vlag die de mate van koppeling aangeeft tussen woorden in een samengesteld woord of een uitdrukking die deel uitmaakt van 25 de woordenlijstgegevens, waarin een "0" een zwakke of geen koppeling aangeeft terwijl een “l" een sterke koppeling aangeeft. ïn dit geval wordt het gebruik als uitdrukking voorzien voor een samengesteld woord of een uitdrukking waaraan een sterke koppeling wordt toegekend, terwijl anderzijds de mogelijkheid om de woorden als individuele woorden 30 te gebruiken parallel daaraan ook in beschouwing wordt genomen.The glossary 1018 contains grammatical information such as phrase information and inflections of the word in each entry entry, as well as a highest preferred flag as shown in Figure 3 for an example entry entry. The glossary is referred to as a word list file with the highest preferred flag. "Highest Preferred Flag" is a flag indicating the degree of linkage between words in a compound word or an expression that is part of the glossary data, where a "0" indicates a weak or no link while a "1" represents a In this case, the use as an expression is provided for a compound word or an expression to which a strong link is assigned, while on the other hand the possibility of using the words as individual words is contemplated in parallel.

Zoals in figuur 3 als voorbeeld is aangegeven is elk van de ingangsgegevens in de woordenlijst 1018 resp. geklasseerd op samengesteld woord, uitdrukking, en individuele woorden die daarvan deel uitmaken, waarbij geen verschil wordt gemaakt tussen de individuele woorden en de 35 samengestelde woorden of uitdrukkingen. Verder maakt elke verbuiging deel uit van het ingangsgegeven. Als er meerdere verbuigingsvormen bestaan dan zijn ze resp. als verschillend ingangsgegeven vastgelegd. Het type verbuiging wordt weergegeven in de verbuigingssectie. De situatie is soortgelijk voor de zinsdeelInformatie, de registratie van meerdere 40 zinsdelen en daarin aanwezige informatie is mogelijk. Als verdere in- 8702359 έ i- 19 formatie wordt de telbaarheid of niet telbaarheid van een zelfstandig naamwoord, de overgankelijkheid of niet overgankelijkheid van een werkwoord, of een vertaald woord en dergelijke geregistreerd.As shown in Figure 3 as an example, each of the input data in the glossary 1018, respectively. arranged by compound word, expression, and individual words that form part thereof, with no difference being made between the individual words and the 35 compound words or expressions. Furthermore, every inflection is part of the entry data. If there are several forms of inflection, they are resp. defined as different input data. The type of bend is shown in the bend section. The situation is similar for the phrase information, the registration of several 40 phrases and the information contained therein is possible. As a further information, the countability or non-countability of a noun, the transitability or non-transitability of a verb, or a translated word and the like is recorded.

"Get" is bijvoorbeeld de onbepaalde wijs van een werkwoord en de 5 hoogste voorkeursvlag is "0". De zin "get up" is een uitdrukking met een werkwoord in bepaalde wijs en hoogste voorkeursvlag ervan is "1". Verder heeft de voorzetselgroep "up to" de hoogste referentievlag "1", maar een groep van zelfstandige naamwoorden "white house" heeft als samengesteld woord de hoogste voorkeursvlag "0" en dit laatste toont dus 10 aan dat de koppelingsgraad tussen de woorden laag is. In de figuur wijst het symbool " op een leeg karakter.For example, "Get" is the indefinite mood of a verb, and the 5 highest preferred flag is "0". The phrase "get up" is an expression with a verb in some sense and its highest preferred flag is "1". Furthermore, the preposition group "up to" has the highest reference flag "1", but a group of nouns "white house" has the highest preferred flag "0" as a compound word, so the latter shows that the link between the words is low . In the figure, the symbol "indicates an empty character.

Op deze wijze bevat de woordenlijst!nformatie die verkregen is in de woordenlijstopzoeksectie 1108 de hoogste voorkeursvlag. Indien de hoogste voorkeursvlag van identieke karakterarray's of overlappende ka-15 rakterarray's voor beiden is ingesteld op "1" dan moet een dergelijke tegenspraak worden geelimineerd. De tegenspraakeliminatieregel-verwer-kingssectie 1110 zorgt voor de eliminatie van de tegenspraak en dit proces wordt uitgevoerd onder verwijzing naar de tegenspraakeliminatie-regel voor de hoogste voorkeursvlag, opgeslagen in het analyseregelbe-20 stand 1036.In this manner, the glossary information obtained in the glossary lookup section 1108 contains the highest preferred flag. If the highest preferred flag of identical character arrays or overlapping character arrays for both is set to "1", then such contradiction must be eliminated. The contradiction elimination rule processing section 1110 takes care of the elimination of the contradiction and this process is performed with reference to the contradiction elimination rule for the highest preferred flag stored in the analysis rule file 1036.

De tegenspraakeliminatieregel wordt in de onderhavige uitvoeringsvorm toegepast in de volgorde van (1) naar (3) waarmee een voorkeursselectie wordt gemaakt.The contradiction elimination rule is applied in the present embodiment in the order from (1) to (3) with which a preferred selection is made.

(1) Uitdrukking of woord waarvan het zinsdeel gelijk is aan een werk-25 woord.(1) Phrase or word whose phrase is equivalent to a working word.

(2) Samengesteld woord, uitdrukking of woord met vele samenstellende woorden.(2) Compound word, phrase, or word with many constituent words.

(3) Samengesteld woord, uitdrukking of woord gesitueerd in het eerste gedeelte van de zin.(3) Compound word, phrase or word situated in the first part of the sentence.

30 Het gebruik voor het op deze wijze geselecteerde woord, d.w.z de analyse-eenheid, weergegeven als de actieve informatie in de informa-tiebuffer 1120 voor de ontvangen woordelijstinformatie in de verwer-kingssectie 1016. De actieve informatie laat zien dat de analyse-eenheid geldig of effectief is indien ze "1" is en toont aan dat de moge-35 lijkheden ervan niet worden gebruikt indien de "0" is.The use for the word selected in this way, ie the analysis unit, displayed as the active information in the information buffer 1120 for the received word list information in the processing section 1016. The active information shows that the analysis unit is valid or effective if it is "1" and shows that its capabilities are not used if it is "0".

De stuursectie 1112 is een functionele sectie voor het besturen en regelen van de werking en de verwerking in elk van de functionele secties van de morfologische analysesectie 1016. De sectie kan opgenomen zijn in de stuursectie 1038 voor het besturen van de gehele inrich-40 ting.The control section 1112 is a functional section for controlling and controlling the operation and processing in each of the functional sections of the morphological analysis section 1016. The section may be included in the control section 1038 for controlling the entire device.

8702359 r ί 208702359 r ί 20

Een resultaat van de morfologische analyse wordt via de uitvoer-koppelschakeling 1114 overgedragen naar de analysesectie I 1020. Indien het resultaat niet direct naar de analysesectie I 1020 wordt overgedragen dan wordt het opgeslagen in het analyseinvoerbestand 1116 en in het 5 analysewoordenlijstinformatiebestand 1118.A result of the morphological analysis is transferred to the analysis section I 1020 via the output coupling circuit 1114. If the result is not directly transferred to the analysis section I 1020, it is stored in the analysis input file 1116 and in the analysis glossary information file 1118.

Omdat in deze uitvoeringsvorm alle woorden, samengestelde woorden of zinnen, te beginnen met de uitsnijpositie van de woorden!ijstrefe-rentieeenheid, worden gebruikt bij de morfologische analyse, wordt woorden!ijstinformatie verkregen voor de individuele woorden die een 10 samengesteld woord vormen of voor een uitdrukking die wordt beoordeeld als een samengestelde "eenheid" in overeenstemming met de hoogste voorkeursvlag, buiten beschouwing gelaten. Dat wil zeggen de mate van koppeling tussen de woorden in de zin wordt beoordeeld terwijl wordt verwezen naar de hoogste voorkeursvlag van de woorden!ijstinformatie ver-15 kregen bij de morfologische analyse. Bij samengestelde woorden of uitdrukkingen wordt ervan uitgegaan dat diegenen, die met een sterke koppeling worden beoordeeld, worden gebruikt als uitdrukking in de zin en indien dit niet zo is dan wordt parallel daaraan ook de mogelijkheid beoordeeld dat ze als afzonderlijke woorden zijn gebruikt.Since in this embodiment, all words, compound words or sentences, starting with the excision position of the words ice reference unit, are used in the morphological analysis, words ice information is obtained for the individual words constituting a compound word or for a expression disregarded as a composite "unit" in accordance with the highest preference flag. That is, the degree of coupling between the words in the sentence is judged while referring to the highest preferred flag of the words ice information obtained in the morphological analysis. Compound words or expressions assume that those who are judged with a strong link are used as an expression in the sentence, and if not, the possibility of being used as separate words is also assessed in parallel.

20 Een dergelijke verwerking met de hoogste voorkeursvlag wordt uitgevoerd door het stroomschema dat getoond is in figuur 4. De gegevens voor de ingevoerde karakterarray's worden ontvangen vanaf de invoersec-tie 1010 (1200), waarbij de ingangskarakterarray wordt uitgesneden als woorden!ijstreferentie-eenheid waarmee in het woordenlijstbestand 1018 25 met de hoogste voorkeursvlag (1201) kan worden gezocht, in de woordenlijst 1018 wordt in overeenstemming daarmee gezocht, hetgeen wordt uit-gevoerd tot aan de laatste positie van de zin die wordt vertegenwoordigd door de gegevens van de ingevoerde karakterarray (1202), waarna tegenspraak op de hoogste voorkeursvlag wordt geelimineerd (1204) en 30 het resultaat van de morfologische analyse wordt uitgevoerd naar de analysesectie I 10 (1205).Such processing with the highest preferred flag is performed by the flow chart shown in Figure 4. The data for the input character arrays is received from the input section 1010 (1200), with the input character array being cut out as words ice reference unit with which the glossary file 1018 with the highest preferred flag (1201) can be searched, the glossary 1018 searches accordingly, which is executed up to the last position of the sentence represented by the data of the input character array ( 1202), after which contradiction on the highest preferred flag is eliminated (1204) and the result of the morphological analysis is carried out to the analysis section I10 (1205).

In het invoerverwerkingsblok 1200 worden de gegeven allereerst gelezen uit het invoerdocumentenbestand 1102 of in de invoereenheid 1100 ingevoerd in de invoerkarakterarraybuffer of invoerkoppel schakeling 35 1104 (1210, zie figuur 5). De gegevens voor de ingangskarakterarray worden bijvoorbeeld in de vorm van ASCII-codes ingevoerd en indien het symbool EOF (einde bestand) wordt gelezen dan schrijft de verwerkings-sectie 1106 de nul-code in de ingangskarakterarraybuffer op de laatste positie.In the input processing block 1200, the data is first read from the input document file 1102 or in the input unit 1100 input into the input character array buffer or input torque circuit 1104 (1210, see Figure 5). For example, the data for the input character array is input in the form of ASCII codes, and if the symbol EOF (end of file) is read, the processing section 1106 writes the zero code into the input character array buffer at the last position.

40 Daarna wordt de ingangskarakterarray (1211) door de verwerkings- 8702359 i i 21 sectie 1106 opnieuw gevormd. Als bijvoorbeeld twee of meer met een spatie corresponderende karakters achter elkaar worden aangetroffen dan worden ze bij wijze van correctie gerangschikt in een enkel leeg karakter. Een met een spatie corresponderend karakter is bijvoorbeeld een 5 leeg karakter (vertegenwoordigd door het symbool U ), tabulator, lijneinde (vertegenwoordigd door het symbool enz.. Deze met spaties corresponderende karakters tussen het begin en het eerst verschijnende karakter, dat niet gelijk is aan een spatie met een corresponderend karakter in de ingangskarakterarraybuffer worden geelimineerd.Then, the input character array (1211) is reformed by the processing 8702359 i 21 section 1106. For example, if two or more characters corresponding to a space are found in succession, they are arranged in a single empty character for correction. For example, a character corresponding to a space is an empty character (represented by the symbol U), tabulator, line break (represented by the symbol, etc.). These characters corresponding to spaces between the beginning and the first appearing character, which is not equal to a space with a corresponding character in the input character array buffer is eliminated.

10 De ingangskarakterarray " ü UIliwill uuget u up^toUgo ÜtoUa U10 The input character array "ü UIliwill uuget u up ^ toUgo ÜtoUa U

whited house ..." wordt omgevormd als getoond is in figuur 6 in "I üwill üget f-lup^to^go^toUa^white^house^.... (nul-code)". De positie van de nul-code geeft de laatste positie in de buffer aan.whited house ... "is transformed as shown in figure 6 into" I üwill üget f-lup ^ to ^ go ^ toUa ^ white ^ house ^ .... (zero code) ". The position of the zero- code indicates the last position in the buffer.

De woorden!ijstreferentiebegrenzers die worden gebruikt voor de 15 uitsnijdingsbewerking 1201 van de woordenlijstreferentieeenheden worden geplaatst bij de positie van alfabetische karakters, numerieke karakters, aanhalingstekens en andere karakters dan koppeltekens en rusttekens, alsmede bij apostrofen die volgen op lege karakters. De verwer-kingssectie 1106 heeft een beginaanwijzer waarmee naar de woordenlijst 20 wordt verwezen en die aanvankelijk wordt ingesteld op het begin van de buffer.The words ice reference limiters used for the cropping operation 1201 of the word reference units are placed at the position of alphabetic characters, numeric characters, quotation marks and characters other than hyphens and rest marks, as well as apostrophes following blank characters. The processing section 1106 has an initial pointer which refers to the glossary 20 and is initially set to the beginning of the buffer.

De woordenlijstopzoeksectie 1108 zoekt dan in het woordenlijstbe-stand 1018 tezamen met de hoogste voorkeursvlag gebruikmakend van de karakterarray te beginnen met het karakter dat wordt aangeduid door de 25 beginaanwijzer tot aan het karakter dat vooraf gaat aan de volgende be-grenzingswaarde, welke karakters de terugwinsleutelkarakterarray vormen. De ingangsgegevens in de woordenlijst en de terugwinsleutelkarakterarray worden dan met elkaar vergeleken en indien ze beiden identiek zijn dan wordt de woordenlijstinformatie overgenomen (1203). Een over-30 eenstemming wordt vastgesteld indien de gehele karakterarray van het woordenlijstingangsgegeven overeenstemt met tenminste een deel van de karakterarray vanaf het begin ervan en indien het gedeelte juist na dit deel een woordenlijstreferentiebegrenzer is, dan wel een apostrof of een rustteken. Als bijvoorbeeld, zoals getoond is in figuur 7, de be-35 ginaanwijzer wijst naar het eerste karakter "g" in de terugwinsleutel-karakterarray "get U upon*-) ", dan stemmen "get" en "get upon" uit de woordenlijstingangsgegevens daarmee overeen.The glossary lookup section 1108 then searches the glossary file 1018 along with the highest preferred flag using the character array starting with the character indicated by the initial pointer up to the character preceding the next limiting value, which characters the recovery key character array to shape. The input data in the glossary and the recovery key character array are then compared and if both are identical, the glossary information is adopted (1203). A match is established if the entire character array of the vocabulary entry data corresponds to at least a portion of the character array from the beginning of it and if the portion is just a word reference delimiter, or an apostrophe, or rest character just after this portion. For example, as shown in Figure 7, if the start pointer points to the first character "g" in the recovery key character array "get U upon * -)", then "get" and "get upon" vote from the glossary entry data corresponding to that.

De teruggewonnen woordenlijstinformatie wordt opgeslagen in de teruggewonnen woordenlijstinformatiebuffer 1120 van de verwerkingssectie 40 1106. Tijdens het uitlezen worden ook de correct beoordeelde startposi- 8702359The recovered glossary information is stored in the recovered glossary information buffer 1120 of the processing section 40 1106. During readout, the correctly assessed start position is also 8702359

T IT I

22 tie en de eindpositie van de karakterarray opgeslagen. Daarmee wordt op een ordelijke wijze de positie gespecificeerd van de karakters in de ingangsbuffer te beginnen aan het begin. Een accumulatiegebied voor de actieve informatie wordt ter beschikking gesteld in het herwonnen woor-5 denlijstinformatiebuffer 1120, hetgeen informatie is die aangeeft of de herwonnen woorden!ijstinformatie al dan niet effectief is voor de navolgende bewerking, en elk ervan wordt op "1" gesteld in deze stap.22 and the end position of the character array is stored. This in an orderly manner specifies the position of the characters in the input buffer starting at the beginning. An accumulation area for the active information is made available in the recovered word list information buffer 1120, which is information indicating whether or not the recovered word list information is effective for the subsequent operation, and each is set to "1" in this step.

Vervolgens wordt de beginaanwijzer telkens verder geschoven naar de volgende woordenlijstverwijzingseenheid en geplaatst bij het karak-10 ter direct na de begrenzer die als eerste verschijnt na de huidige positie van de beginaanwijzer waardoor de karakterarray wordt afgetast van links naar rechts. Het zoeken in de woordenlijst wordt dus achtereenvolgens uitgevoerd. In het bovengenoemde voorbeeld wordt als karakter aan het begin van een woorden!ijstreferentie-eenheid allereerst de 15 'T' genomen voor het woord "1", daarna “w" voor "will" en vervolgens “g" voor "get". Als de beginaanwijzer de nul-code passeert dan wordt deze als laatste positie (1202) herkend.Then, the initial pointer is moved further to the next glossary reference unit and placed at the character immediately after the limiter that appears first after the current position of the initial pointer, scanning the character array from left to right. The search in the glossary is therefore carried out successively. In the above example, as a character at the beginning of a word ice reference unit, first the 15 'T' is taken for the word "1", then "w" for "will" and then "g" for "get". When the initial pointer passes the zero code, it is recognized as the last position (1202).

Figuur 9 toont een voorbeeld van de op deze wijze verkregen woor-denlijstinformatie uitgaande van de boven beschreven ingevoerde Engelse 20 karakterarray.Figure 9 shows an example of the glossary information obtained in this way from the input English character array described above.

Een verklaring zal worden gegeven voor de tegenspraakeliminatiebe-werking 1204 die wordt uitgevoerd door de tegenspraakeliminatieregel-verwerkingssectie 110, waarbij wordt verwezen naar het tegenspraakeli-minatieregelbestand voor het meest preferente bestand 1036 in samenhang 25 met de figuren 8A tot en met 8D. Het stroomschema dat getoond is in de figuren 8A en 8B illustreert de verwerking in het geval de posities voor de woorden waarin de hoogste voorkeursvlaggen zijn ingesteld, elkaar overlappen terwijl het stroomschema in de figuren 8C en 8D de verwerking illustreert waarbij een analyse-eenheid, dat wil zeggen elemen-30 ten met de hoogste voorkeursvlag, door het omvorrnen van actieve informatie in "0" wordt geelimineerd. In deze stroomschema's wordt met "<=" een substitutie aangegeven, vertegenwoordigt een verwijzing en "P-»x" vertegenwoordigt de inhoud x opgeslagen in het ingangsgegeven dat door de respectievelijke aanwijswaarde p wordt aangewezen.An explanation will be given for the contradiction elimination operation 1204 performed by the contradiction elimination rule processing section 110, referring to the contradiction elimination rule file for the most preferred file 1036 in conjunction with Figures 8A through 8D. The flow chart shown in Figures 8A and 8B illustrates the processing in case the positions for the words in which the highest preferred flags are set overlap, while the flow chart in Figures 8C and 8D illustrates the processing in which an analysis unit, which i.e., elements with the highest preferred flag, by transforming active information into "0" is eliminated. In these flowcharts, "<=" represents a substitution, represents a reference, and "P-» x "represents the content x stored in the input data designated by the respective pointer value p.

35 Allereerst wordt een groep van woorden elk met de hoogste voorkeursvlag gelijk aan “l" waarvan de posities in de zin elkaar overlappen, gedetecteerd (stappen 1220-1223). Daarna wordt de hoogste voor-keursvlag-eliminatieregel toegepast op elk van de gedetecteerde groepen en de effectieve groepen worden daaruit geselecteerd (stappen 40 1224-1235).First, a group of words each with the highest preferred flag equal to "1" whose positions in the sentence overlap is detected (steps 1220-1223). Then, the highest preferred flag elimination rule is applied to each of the detected groups and the effective groups are selected therefrom (steps 40 1224-1235).

8702359 i 't 238702359 i'm 23

In de boven beschreven uitvoeringsvorm wordt bijvoorbeeld de hoogste voorkeursvlag "1" ingesteld bij "get up" op de startpositie "8" en de eindpositie "13" en bij "up to" op de beginpositie "12" en de eindpositie "16" voor de karakterarray "get^up^to" zoals getoond is in fi-5 guur 9 waarbij de posities van de karakters met elkaar overlappend zijn. Daarna wordt de boven beschreven selectieregel (1) als eerste toegepast en wordt beoordeeld of het gaat om een werkwoord of niet waarbij wordt verwezen naar het zinsdeel van de behoudaanwijzer psave en het zinsdeel van de aanwijzer p (1224). Omdat er in dit voorbeeld 10 overeenstemming is met een werkwoord wordt de combinatie "get up" geselecteerd.For example, in the above-described embodiment, the highest preferred flag "1" is set at "get up" at the start position "8" and the end position "13" and at "up to" at the start position "12" and the end position "16" for the character array "get ^ up ^ to" as shown in figure 9 where the positions of the characters are overlapping. Then, the selection rule (1) described above is applied first and it is judged whether it is a verb or not referring to the phrase of the conservation pointer psave and the phrase of the pointer p (1224). Because in this example 10 is similar to a verb, the combination "get up" is selected.

Indien niet aan regel (1) wordt voldaan dan wordt regel (2) toegepast (1228) en worden de lengte (lens) van de karakterarray met verwijzing naar het ingangsgegeven van de behoudaanwijzer psave en de lengte 15 1 en van de karakterarray met verwijzing naar de ingangsgegevens van de aanwijzer p met elkaar vergeleken. Als verder ook niet wordt voldaan aan regel (2) dan wordt regel (3) toegepast (1229) en worden de startpositie met verwijzing naar de startpositie van de behoudaanwijzer psave en de startpositie met verwijzing naar de startpositie van de 20 aanwijzer p met elkaar vergeleken.If rule (1) is not satisfied then rule (2) is applied (1228) and the length (lens) of the character array with reference to the input data of the conservation pointer psave and the length 15 and of the character array with reference to compared the input data of the pointer p. If rule (2) is also not satisfied, rule (3) is applied (1229) and the starting position with reference to the starting position of the conservation pointer psave and the starting position with reference to the starting position of the 20 pointer p are compared with each other .

Als aan willekeurig een van de tegenspraakeliminatieregels (1) tot en met (3) bij toepassing in deze volgorde wordt voldaan dan wordt de actieve informatie van het ingangsgegeven dat niet voldeed, dat wil zeggen niet effectief is, op "0" gesteld (1232), terwijl in andere ge-25 vallen de actieve informatie, dat wil zeggen het effectieve ingangsgegeven, wordt gelaten op "1" (1231) hetgeen de bestaande toestand was. Een dergelijke toepassing van de tegenspraakeliminatieregel wordt achtereenvolgens uitgevoerd terwijl de aanwijseenheid p stapsgewijze (1234, 1235) voortgaat naar de laatste positie voor elk van de succes-30 sievelijke ingangsgegevens en de actieve informatie alleen voor het effectieve ingangsgegeven op "1" wordt gehandhaafd. De verwerking voor het bovengenoemde voorbeeld is geïllustreerd in figuur 10. Voor het ingangsgegeven "up to" wordt bijvoorbeeld de actieve informatie op "0" gesteld.If any of the contradiction elimination rules (1) through (3) are satisfied when applied in this order, then the active information of the input data that was not satisfactory, that is, is not effective, is set to "0" (1232) while in other instances the active information, ie the effective input data, is left at "1" (1231) which was the existing state. Such application of the contradiction elimination rule is performed sequentially while the pointer p progresses (1234, 1235) to the last position for each of the successful input data and the active information is maintained at "1" only for the effective input data. The processing for the above example is illustrated in Figure 10. For the input entry "up to", for example, the active information is set to "0".

35 Vervolgens worden de overlappende eenheden, zelfs gedeeltelijk overlappende eenheden, die in de combinatie beiden actieve informatie met de hoogste voorkeursvlag "1" hebben, gedetecteerd (1236-1241) en hun actieve informatie wordt op "0" gesteld (1242, 1249). De toepassing van een dergelijke tegenspraak-eliminatieregel wordt ordelijk uitge-40 voerd voor elk van de ingangsgegevens terwijl de aanwijseenheid p 8702359 τ i 24 stapsgewijze verder beweegt (1243, 1248) naar de laatste positie en de actieve informatie van niet effectieve ingangsgegevens wordt op "0" gesteld. De actieve informatie voor “get" en “up" wordt derhalve op "0" gesteld bijvoorbeeld voor het ingangsgegeven “get up" (figuur 10). Om-5 dat voor alle ingangsgegevens “white", "white house" en "house" alle hoogste voorkeursvlaggen gelijk aan "0" zijn wordt zelfs indien de posities elkaar overlappen hun actieve informatie op "1" gehandhaafd.Next, the overlapping units, even partially overlapping units, both of which have active information with the highest preferred flag "1" are detected (1236-1241) and their active information is set to "0" (1242, 1249). The application of such a contradiction elimination rule is executed orderly for each of the input data while the pointer p 8702359 τ i 24 advances stepwise (1243, 1248) to the last position and the active information of ineffective input data is set to " 0 ". The active information for “get” and “up” is therefore set to “0” for example for the input data “get up” (figure 10). Om-5 that for all input data “white”, “white house” and “house” all highest preferred flags are "0", even if the positions overlap, their active information is maintained at "1".

Als op deze wijze de verwerking wordt uitgevoerd dan wordt juist voor de laatste positie (nul-code) de inhoud van de ingangsbuffer van 10 de ingangskoppel schakeling 1104 en van de herwonnen woorden!ijstinfor-matiebuffer 1120 afgegeven via de uitgangskoppelschakeling 1114 aan de analysesectie I 1016. De inhoud van de herwonnen woorden!ijstinforma-tiebuffer 1120 wordt alleen afgegeven voor ingangsgegevens waarvoor een "1" is genoteerd voor de actieve informatie. De inhoud van de invoer-15 buffer kan bijvoorbeeld worden ingeschreven in het analyse-invoerbe-stand 1116 terwijl de inhoud van de herwonnen woordenTijstinformatie-buffer 1120 kan worden ingeschreven in het analysewoordenlijstinforma-tiebestand 1118. Omdat in dit geval ook zowel de actieve informatie als de hoogste voorkeursvlag worden afgegeven is de structuur van het ana-20 lysewoordenlijstinformatiebestand 118 identiek aan die van de herwonnen woorden!ijstinformatiebuffer. De methode kan ook zodanig worden uitgevoerd dat de actieve informatie en de hoogste voorkeursvlag niet worden afgegeven.When processing is performed in this manner, just before the last position (zero code), the contents of the input buffer of the input torque circuit 1104 and of the recovered words ice information buffer 1120 are output through the output torque circuit 1114 to the analysis section I 1016. The contents of the recovered words ice information buffer 1120 are output only for input data for which a "1" is noted for the active information. For example, the contents of the input-15 buffer can be written into the analysis input file 1116, while the contents of the retrieved words List information buffer 1120 can be written into the analysis dictionary information file 1118. Because in this case, both the active information and the highest preferred flag being issued is the structure of the analysis word list information file 118 identical to that of the recovered words list information buffer. The method can also be performed so that the active information and the highest preferred flag are not issued.

De onderhavige uitvinding zal nu meer in het bijzonder met verwij-25 zing naar de tweede uitvoeringsvorm worden beschreven.The present invention will now be described more particularly with reference to the second embodiment.

Figuur 11 illustreert de tweede uitvoeringsvorm van de taalanaly-sator volgens de onderhavige uitvinding, toegepast bij een automatische inrichting voor het vertalen van Engels naar Japans.Figure 11 illustrates the second embodiment of the language analyzer of the present invention used in an automatic English to Japanese translation device.

Deze uitvoeringsvorm heeft een invoersectie 2014, waarin data 30 wordt ingevoerd vanaf een invoereenheid 2010 of een invoerdocumentenbe-stand 2012. De ingangseenheid 2010 omvat bijvoorbeeld een toetsenbord met karaktertoetsen zoals alfanumerieke toetsen of functietoetsen, en een optische karakter!ezer voor het lezen van op papier vastgelegde Engelse tekst. Het invoerdocumentenbestend 2012 is een geheugeneenheid 35 met Engelse tekst geregistreerd op een geheugenmedium zoals een magnetische schijf.This embodiment has an input section 2014, in which data 30 is input from an input unit 2010 or an input document file 2012. The input unit 2010 comprises, for example, a keyboard with character keys such as alphanumeric keys or function keys, and an optical character reader for reading on paper captured English text. The input document file 2012 is a memory unit 35 with English text recorded on a memory medium such as a magnetic disk.

De invoersectie 2014 omvat een ingangskarakterarraybuffer 2014a en herbergt de Engelse ingangszin die ingevoerd wordt vanaf de invoereenheid 2010 of vanaf het invoerdocumentenbestand 2012 in de ingangskarak-40 terarraybuffer 2014a. De invoersectie 2014 leest de in de ingangskarak- 8702359 £ Ί.The input section 2014 includes an input character array buffer 2014a and houses the English input sentence input from the input unit 2010 or from the input documents file 2012 in the input character 40 terarray buffer 2014a. The input section 2014 reads the entry in the input character 8702359 £ Ί.

25 terarraybuffer 2014 opgeslagen zin uit en voert deze toe aan de verwer-kingssectie 2016.25 terarray buffer 2014 stored sentence and feed it to the processing section 2016.

De verwerkingssectie 2016 is een functionele sectie die de morfologische analyse uitvoert voor de ingevoerde zin die afkomstig is vanaf 5 de ingangssectie 2014 door opzoeken in een woordenlijstenbestand. De verwerkingssectie 2016 bevat een woordenlijstinformatiebehoudtabel 2016a en slaat de informatie, verkregen door opzoeken in een woorden-lijstbestand 2022 of een nog later te beschrijven fundamentele eenheid woordenlijstbestand 2026 op in een woordenlijstinformatiebehoudtabel 10 2016a.The processing section 2016 is a functional section that performs the morphological analysis for the entered sentence that originates from the 5th entry section 2014 by looking up in a glossary file. The processing section 2016 includes a glossary information retention table 2016a and stores the information obtained by looking up a glossary file 2022 or a basic unit glossary file 2026 to be described later in a glossary information retention table 2016a.

De verwerkingssectie 2016 zoekt in de woordenlijst met een terug-winsleutelkarakterarray die als een eenheid verkregen is uit de karak-terarray waaruit de ingangszin, ingevoerd vanaf de ingangssectie 2014, bestaat. De terugwinsleutelkarakterarray's worden op ordelijke wijze 15 gevormd te beginnen met het eerste karakter in de karakterarray waaruit de ingangszin bestaat in overeenstemming met een vooraf bepaalde vor-mingsregel. De ingangszin wordt bijvoorbeeld verdeeld vanaf het begin van de zin op ordelijke wijze door begrenzers, bijvoorbeeld spaties, komma's, enz. en de verdeelde karakterarray's worden resp. gebruikt als 20 terugwinsleutelkarakterarray. In dit geval worden karakterarray's waarmee eenheden tot uitdrukking worden gebracht zoals m, km, m/s resp. ook gevormd als een terugwinsleutelkarakterarray. De verwerkingssectie 2016 zendt de terugwinsleutelkarakterarray, verkregen uit de karakterarray die de ingangszin vormt, naar de woordenlijstopzoeksectie 2020.The processing section 2016 searches the glossary with a back-win key character array obtained as a unit from the character array that makes up the entry sentence, entered from the entry section 2014. The recovery key character arrays are formed in an orderly manner starting with the first character in the character array that makes up the input sentence in accordance with a predetermined formation rule. For example, the input sentence is divided from the beginning of the sentence in an orderly manner by delimiters, for example, spaces, commas, etc., and the divided character arrays are resp. used as a 20 key recovery array. In this case, character arrays that express units such as m, km, m / s, respectively. also shaped as a recovery key character array. The processing section 2016 sends the recovery key character array obtained from the character array constituting the input sentence to the glossary lookup section 2020.

25 De woordenlijstopzoeksectie 2020 zorgt voor het zoeken in de woordenlijst 2022 gebaseerd op de terugwinsleutelkarakterarray's die zijn uitgezonden door de verwerkingssectie 2016. In de woordenlijst 2022 zijn ingangsgegevens en grammaticale informatie zoals zinsdelen opgeslagen op de wijze als getoond in figuur 12. Als er een ingangsgegeven 30 in de woordenlijst 2022 aanwezig is, dan leest de woordenlijstopzoeksectie 2020 het zinsdeel, de informatie enz. voor dat ingangsgegeven uit en geeft dit af aan de verwerkingssectie 2016. Als er geen ingangsgegeven in de woordenlijst 2022 wordt gevonden als resultaat van het zoekproces in de woordenlijst 2022, dan geeft de woordenlijstopzoeksec-35 tie 2020 deze situatie weer aan de verwerkingssectie 2016.25 The glossary lookup section 2020 provides the glossary 2022 search based on the recovery key character arrays emitted from the processing section 2016. In the glossary 2022, input data and grammatical information such as phrases are stored in the manner shown in Figure 12. If there is an input 30 is present in the glossary 2022, then the glossary lookup section 2020 reads out the phrase, information, etc. for that entry and passes it to the processing section 2016. If no entry is found in the glossary 2022 as a result of the glossary search 2022, the glossary lookup section 2020 reflects this situation to the processing section 2016.

De verwerkingssectie 2016 bergt zinsdeel informatie, enz., verkregen met de woordenlijstopzoeksectie 2020 in een woordenlijstinformatie-behoudtabel 2016a. Als er geen ingangsgegeven voor de terugwinsleutel-karakterarray aanwezig is in de woordenlijst 2022, dan geeft de verwer-40 kingssectie 2016 de terugwinsleutelkarakterarray af aan een eenheidher- 6702358 X ϊ 26 kenningssectie 2024.The processing section 2016 stores phrase information, etc., obtained with the glossary lookup section 2020 in a glossary information retention table 2016a. If there is no entry for the recovery key character array in the glossary 2022, then the processing section 2016 outputs the recovery key character array to a unit recovery section 70242 X8 26 section.

De eenheidherkenningssectie 2024 zoekt in een fundamentele eenheid woordenlijst 2026 gebaseerd op de terugwinsleutelkarakterarray uitgezonden door de verwerkingssectie 2016. De ingangsgegevens in de funda-5 mentele eenheid worden opgeslagen in de fundamentele eenheid woordenlijst 2026 op de wijze als getoond in figuur 13. Als het ingangsgegeven aanwezig is in de fundamentele eenheid woordenlijst 2026, dan wordt dit ingangsgegeven door de eenheid herkenningssectie 2024 uitgelezen. Als er in de fundamentele eenheid woordenlijst 2026 geen ingangsgegeven 10 aanwezig is dan wordt de terugwinsleutelkarakterarray verdeeld in een aantal karakterarray's zoals Tater nog wordt beschreven en wordt nogmaals in de fundamentele eenheid woordenlijst 2026 gezocht. Als dan de fundamentele eenheid ingangsgegevens tijdens een van de volgende zoekprocessen in de fundamentele eenheid woordenlijst 2026 wel aanwezig 15 zijn, dan worden een aantal eenheidsinformaties verkregen uit de fundamentele eenheid ingangsgegevens. Als er geen fundamentele eenheid ingangsgegeven aanwezig is tijdens willekeurig een van de meerdere zoekprocessen, dan wordt in de woordenlijst een indicatie geplaatst, aange-gevende dat dit gegeven niet geregistreerd is.The unit recognition section 2024 searches a fundamental unit glossary 2026 based on the recovery key character array emitted by the processing section 2016. The input data in the fundamental unit is stored in the basic unit glossary 2026 in the manner shown in Figure 13. If the input data is present if in the basic unit glossary 2026, this input is read out by the unit recognition section 2024. If there is no entry 10 in the glossary fundamental unit 2026, then the recovery key character array is divided into a number of character arrays as described later, and Tater 2026 fundamental unit is searched again. Then, if the fundamental unit input data is present in one of the following search processes in the fundamental unit glossary 2026, a number of unit information is obtained from the fundamental unit input data. If no fundamental unit input data is present during any of the multiple search processes, an indication is placed in the glossary, indicating that this data has not been registered.

20 De eenheidherkenningssectie 2024 geeft het fundamentele eenheid ingangsgegeven, de samengestelde eenheidsinformatie en informatie, aan-gegevende dat het woord niet in de woordenlijst is geregistreerd, af aan de verwerkingssectie 2016. De verwerkingssectie 2016 bergt deze informaties, afkomstig van de eenheidsherkenningssectie 2024, op in de 25 woordenlijstinformatiebehoudtabel 2016a. De woordenlijstinformatiebe-houdtabel 2016a behoudt daarmee het ingangsgegeven voor de terugwin-sleutelkarakterarray^ en de grammaticale informatie zoals het zinsdeel verkregen tijdens het zoeken in de woordenlijst 2022 of de fundamentele eenheid woordenlijst 2026 op basis van de terugwinsleutelkarakterarray.20 The unit recognition section 2024 transfers the basic unit input data, the composite unit information and information, indicating that the word is not registered in the glossary, to the processing section 2016. The processing section 2016 stores these information, originating from the unit recognition section 2024. the 25 glossary information retention table 2016a. The glossary information retention table 2016a thus retains the entry data for the retrieve key character array ^ and the grammatical information such as the phrase obtained during the search in the glossary 2022 or the basic unit glossary 2026 based on the recovery key character array.

30 Nadat de data is opgeslagen in de woorden!ijstinformatiebehoudtabel 2016a geeft de verwerkingssectie 2016 deze gegevens af tezamen met de ingevoerde zin aan de uitgangskoppelschakeling 2018. De uitgangskoppel-schakeling 2018 geeft de ingangszin en de gegevens voor de morfologische analyse, afkomstig van de verwerkingssectie 2016, af aan een uit-35 voerinrichting 2030 zoals een afdrukeenheid of een weergeefeenheid, of aan een geheugenbestand 2032 zoals een magnetisch schijfgeheugen.After the data is stored in the words! List information retention table 2016a, the processing section 2016 provides this data together with the entered phrase to the output coupling circuit 2018. The output coupling circuit 2018 provides the input sentence and the data for the morphological analysis, originating from the processing section 2016 , to an output device 2030 such as a printing unit or a display unit, or to a memory file 2032 such as a magnetic disk memory.

Als alternatief is het mogelijk om de ingangszin en de gegevens van de morfologische analyse, afgegeven door de verwerkingssectie 2016, direct in te voeren in (niet geïllustreerde) analysemiddelen teneinde 40 de analyse van de ingangszin in deze analysemiddelen uit te voeren en 87 0 2 5 o &Alternatively, it is possible to directly input the input sentence and the morphological analysis data, issued by the processing section 2016, into analysis means (not illustrated) in order to carry out the analysis of the input sentence in these analysis means and 87 0 2 5 o &

t At A

27 verder een vertaalde zin voor te bereiden gebaseerd op de analyse.27 further prepare a translated sentence based on the analysis.

De stuursectie 2028 is bestemd voor het besturen van de werking van elk van de functionele secties in de onderhavige inrichting en kan bij voorkeur worden gerealiseerd met behulp van een microprocessor.The control section 2028 is intended to control the operation of each of the functional sections in the present device and can preferably be realized using a microprocessor.

5 De werking van de onderhavige inrichting zal worden verklaard met verwijzing naar het stroomschema dat getoond is in figuur 15.The operation of the present device will be explained with reference to the flow chart shown in Figure 15.

Allereerst wordt de Engelse invoerzin gelezen vanaf de invoerin-richting 2010 of het invoerdocumentenbestend 2012 in deze invoersectie 2014 (2100). De in de invoersectie 2014 ingelezen zin wordt opgeslagen 10 in de ingangskarakterarraybuffer 2014a. De in de ingangskarakterarray-buffer 2014a opgeslagen zin wordt dan uitgelezen en afgegeven aan de verwerkingssectie 2016.First of all, the English input sentence is read from the input device 2010 or the input document proof 2012 in this input section 2014 (2100). The sentence read in the input section 2014 is stored in the input character array buffer 2014a. The phrase stored in the input character array buffer 2014a is then read and delivered to the processing section 2016.

Als in de verwerk!ngssectie 2016 de ingangszin is ingevoerd dan worden de woordenlijstterugwineenheden uitgesneden (2102). Dat wil zeg-15 gen, de karakterarray die de ingangszin vormt, wordt verdeeld aan de hand van vooraf bepaalde regels in terugwinsleutelkarakterarray is als zijnde de eenheden waarmee wordt gezocht in de woordenlijst 2022 of de fundamentele eenheid woordenlijst 2026 op successievelijke wijze te beginnen bij het begin van de karakterarray. Daarna wordt beoordeeld of 20 de terugwinsleutelkarakterarray al dan niet aanwezig is (2104) en indien ze aanwezig is dan wordt de terugwinsleutelkarakterarray toegezonden aan de woordenlijstopzoeksectie 2020.If the input sentence has been entered in the processing section 2016, the glossary recovery units are cut (2102). That is, the character array that constitutes the input sentence is distributed according to predetermined rules in recovery key character array is as the units searched in the glossary 2022 or the basic glossary 2026 unit starting successively. start of the character array. Thereafter, it is judged whether or not the recovery key character array is present (2104) and if it is present then the recovery key character array is sent to the glossary lookup section 2020.

Als de terugwinsleutel karakterarray is toegezonden aan de woordenlijstopzoeksectie 2020 dan zorgt de woordenlijstopzoeksectie 2020 voor 25 het zoeken in de woordenlijst 2022 op de terugwinsleutelkarakterarray (2106). Er wordt beoordeeld of de terugwinsleutelkarakterarray al dan niet aanwezig is in de ingangsgegevens van de woordenlijst 2022 zoals getoond is in figuur 12 en, indien het ingangsgegeven aanwezig is, dan wordt de grammaticale informatie zoals het zinsdeel, opgeslagen in de 30 woordenlijst 2022, uitgelezen en de uitgelezen gegevens worden toegezonden aan de verwerkingssectie 2016 en opgeslagen in de woorden!ijst-informatiebehoudtabel 2016a (2110). Daarna keert het stroomschema terug naar stap 2102 en wordt een nieuwe woorden!ijstterugwineenheid uitgesneden.If the recovery key character array has been sent to the glossary lookup section 2020, then the glossary lookup section 2020 will search the glossary 2022 for the recovery key character array (2106). It is judged whether or not the recovery key character array is present in the input data of the glossary 2022 as shown in Figure 12 and, if the input data is present, the grammatical information such as the phrase stored in the glossary 2022 is read out and the read out data is sent to the processing section 2016 and stored in the words! ist information retention table 2016a (2110). Thereafter, the flow chart returns to step 2102 and a new words ice recovery unit is cut out.

35 Als er geen ingangsgegeven aanwezig is in de woordenlijst 2022 dan zendt de woordenlijstopzoeksectie 2020 de terugwinsleutelkarakterarray terug naar de verwerkingssectie 2016, en de verwerkingssectie 2016 zendt de terugwinsleutelkarakterarray naar de eenheidherkenningssectie 2024, waarin de eenheidherkenning wordt uitgevoerd (2112).If no entry is present in the glossary 2022, the glossary lookup section 2020 returns the recovery key character array to the processing section 2016, and the processing section 2016 sends the recovery key character array to the unit recognition section 2024, in which the unit recognition is performed (2112).

40 In een geval waarin de terugwinsleutelkarakterarray, toegezonden 870 2 3 5 9 ï 28 aan de woorden!ijstopzoeksectie 2020 bestaat uit gebruikelijke woorden zoals zelfstandige naamwoorden en werkwoorden, wordt, omdat er ingangsgegevens voor de meesten ervan aanwezig zijn in de woordenlijst 2022, grammaticale informatie zoals zinsdelen, uitgelezen uit de woordenlijst 5 2022 en de gegevens worden toegezonden aan de verwerkingssectie 2016 en geregistreerd in de woordenlijstinformatiebehoudtabel 2016a. Zoals in het bovenstaande is beschreven zijn er ingangsgegevens voor gebruikelijke woorden zoals werkwoorden en zelfstandige naamwoorden gevormd maar er zijn geen ingangsgegevens gevormd voor karakterarray's die de 10 eenheden in de woordenlijst 2022 tot uitdrukking brengen. In het geval waarin de terugwinsleutelkarakterarray een karakteramy is die een eenheid uitdrukt, zoals kg of m/s zal derhalve het stroomschema, omdat er geen ingangsgegeven aanwezig is in de woordenlijst 2022, verder gaan naar stap 2112 voor de eenheidherkenning.40 In a case where the recovery key character array, sent 870 2 3 5 9 ï 28 to the words! List lookup section 2020 consists of common words such as nouns and verbs, since entries for most of them are present in the glossary 2022, grammatical information such as phrases, read from the glossary 5 2022 and the data is sent to the processing section 2016 and recorded in the glossary information retention table 2016a. As described above, entries for common words such as verbs and nouns have been generated but no entries have been created for character arrays that express the 10 units in the glossary 2022. Therefore, in the case where the recovery key character array is a character amy which expresses a unit, such as kg or m / s, since there is no input data in the glossary 2022, the flowchart proceeds to step 2112 for the unit recognition.

15 De eenheidherkenningsoperatie in stap 2112 zal met verwijzing naar figuur 16 worden verklaard.The unit recognition operation in step 2112 will be explained with reference to Figure 16.

Als de terugwinsleutelkarakterarray waarvoor geen ingangsgegeven in de woordenlijst 2022 aanwezig is bij het zoeken in de woordenlijst 2022, wordt toegezonden vanaf de verwerkingssectie 2016 naar de een-20 heidsherkenningssectie 2024, dan wordt de aanwijzer P ingesteld op het karakter aan het begin van de terugwinsleutelkarakterarray in de een-heidsherkenningssectie 2024 (2200).If the recovery key character array for which no entry in the glossary 2022 is present when searching the glossary 2022 is sent from the processing section 2016 to the one-20 recognition section 2024, the pointer P is set to the character at the beginning of the recovery key character array in the unit recognition section 2024 (2200).

Daarna zorgt de eenheidsherkenningssectie 2024 voor het zoeken in de basiseenheidwoordenlijst 2026 op de karakterarray te beginnen met 25 het karakter waarbij de aanwijzer P is ingesteld (2201). Tijdens deze zoekprocedure wordt beoordeeld of de fundamentele eenheid, waarvoor een ingangsgegeven aanwezig is in de fundamentele eenheid woordenlijst 2026, verschijnt als een complete karakterarray in de karakterarray die begint met het karakter waarbij de aanwijzer P is ingesteld of niet en 30 of deze begint met het karakter waarbij de aanwijzer P is ingesteld of niet. Er wordt namelijk vastgesteld of een karakterarray die voorzien is van een aantal karakters, te beginnen met het karakter waarop de aanwijzer P is ingesteld, al dan niet overeenstemt mét willekeurig een van de fundamentele eenheden waarvoor een ingangsgegeven aanwezig is in 35 de fundamentele eenheid woordenlijst 2026. In het geval bijvoorbeeld waarin de karakters, waarop de aanwijzer P is ingesteld, gelijk zijn aan k, m, s, enz., zijn er ingangsgegevens aanwezig in de fundamentele eenheid woordenlijst 2026 voor deze individuele karakters te beginnen met het karakter waarbij de aanwijzer P is ingesteld zoals getoond is 40 in figuur 13.Thereafter, the unit recognition section 2024 searches the base unit glossary 2026 on the character array starting with the character with the pointer P set (2201). During this search, it is judged whether the fundamental unit, for which an entry is present in the basic unit glossary 2026, appears as a complete character array in the character array starting with the character with the pointer P set or not and whether it begins with the character character with pointer P set or not. Namely, it is determined whether or not a character array comprising a number of characters, starting with the character to which the pointer P is set, corresponds to any of the fundamental units for which an entry is present in the fundamental unit glossary 2026 For example, in the case where the characters to which the pointer P is set are k, m, s, etc., there are input data in the basic glossary 2026 for these individual characters starting with the character where the pointer is P is set as shown 40 in Figure 13.

87023598702359

Sr * 29Sr * 29

De eenheidherkenningssectie 2024 beoordeelt of de ingangsgegevens al dan niet aanwezig zijn in de fundamentele eenheid woordenlijst 2026 als resultaat van het zoekproces in de fundamentele eenheid woordenlijst 2026 (2204) en, indien het ingangsgegeven aanwezig is, dan wordt 5 de aanwijzer P over de lengte van de herkende fundamentele eenheid (2208) verder geplaatst. In het geval derhalve, waarin de fundamentele eenheid een k, m, s, enz. is, wordt de aanwijzer P over een karakter verder geplaatst en vervolgens ingesteld op het volgende karakter in de terugwi nsleutelkarakterarray.The unit recognition section 2024 judges whether or not the input data is present in the basic unit glossary 2026 as a result of the search process in the basic unit glossary 2026 (2204) and, if the input data is present, then the pointer P becomes the length of the recognized fundamental unit (2208) is placed further. Therefore, in the case where the fundamental unit is a k, m, s, etc., the pointer P is moved further over a character and then set to the next character in the recovery key character array.

10 De eenheidherkenningssectie 2024 beoordeelt of een verdere karak terarray, beginnend met het karakter waarbij de aanwijzer P is ingesteld, al dan niet aanwezig is (2208). Als zo'n verdere karakterarray aanwezig is dan keert het stroomschema terug naar stap 2202, waarin opnieuw de fundamentele eenheid woordenlijst 2026 wordt gezocht met de 15 karakterarray die begint met het karakter waarbij de aanwijzer P is ingesteld. De sectie beoordeelt dan of het ingangsgegevens al dan niet aanwezig is in de fundamentele eenheid woordenlijst (2204) als resultaat van het zoekproces in de fundamentele eenheid woordenlijst 2026 en, indien het ingangsgegeven aanwezig is, wordt de aanwijzer P verder 20 geplaatst over de lengte van de herkende fundamentele eenheid.The unit recognition section 2024 judges whether or not a further character terarray, starting with the character with the pointer P set, is present (2208). If such a further character array is present, the flowchart returns to step 2202, again searching the basic glossary unit 2026 with the character array beginning with the character with the pointer P set. The section then judges whether or not the input data is present in the basic glossary unit (2204) as a result of the search process in the basic glossary unit 2026 and, if the input data is present, the pointer P is further placed over the length of the recognized fundamental unit.

Als in stap 2208 de karakterarray, te beginnen met het karakter waarbij de aanwijzer P is ingesteld, niet meer aanwezig is, dan wordt het zoeken in de fundamentele eenheid woordenlijst 2026 beëindigd, dat wil zeggen de herkenning van de samengestelde eenheid is geslaagd.If, in step 2208, the character array, starting with the character in which the pointer P is set, is no longer present, the search in the basic unit glossary 2026 is terminated, i.e. the recognition of the composite unit is successful.

25 In het geval bijvoorbeeld waarin de terugwinsleutelkarakterarray, toegezonden aan de eenheidherkenningssectie 2024 gelijk is aan km/s, waarmee een eenheid wordt vertegenwoordigd, is het ingangsgegeven niet aanwezig in de fundamentele eenheid woordenlijst 2026 omdat km/s op zichzelf een samengestelde eenheid is. De aanwijzer P wordt eerst inge-30 steld op k (2200) en k wordt in de fundamentele eenheid woordenlijst 2026 gevonden waarmee de aanwezigheid van het ingangsgegeven (2202) wordt bevestigd.For example, in the case where the recovery key character array, sent to the unit recognition section 2024 equals km / s, representing a unit, the entry is not present in the basic unit glossary 2026 because km / s is itself a composite unit. The pointer P is first set to k (2200) and k is found in the glossary basic unit 2026 confirming the presence of the input data (2202).

Daarna wordt de aanwijzer P ingesteld op m (2206) en m wordt gevonden in de fundamentele eenheid woordenlijst 2026 (2202) waarmee de 35 aanwezigheid van het ingangsgegeven op dezelfde wijze wordt bevestigd. Omdat de eenheidherkennningssectie 2024 een schuine streep (/), een punt (.), enz. als deel van een eenheid beschouwt wordt de aanwijzer P vervolgens ingesteld op s waardoor "/" in km/s wordt overgeslagen (2206). Daarna wordt s gevonden in de fundamentele eenheid woordenlijst 40 2026 waarmee de aanwezigheid van het ingangsgegeven op dezelfde wijze 8702359 1 ï 30 wordt bevestigd (2202). Omdat Ingangsgegevens worden gevonden zowel voor k, m als s tijdens het zoeken in de fundamentele eenheid woordenlijst 2026 wordt als resultaat daarvan geoordeeld dat km/s een karakterarray is waarmee een eenheid wordt uitgedrukt. Op deze wijze wordt, 5 in een geval waarin ingangsgegevens aanwezig zijn in de fundamentele eenheid woordenlijst 2026 voor alle karakters die deel uitmaken van de terugwinsleutelkarakterarray, of in een geval waarin gegevens aanwezig zijn in de fundamentele eenheid woordenlijst 2026 voor alle karakters behalve symbolen zoals schuine strepen, punten, enz., die beschouwd 10 worden als gedeelte van de eenheid, geoordeeld dat de terugwinsleutelkarakterarray een karakterarray is die staat voor een eenheid.Thereafter, the pointer P is set to m (2206) and m is found in the basic glossary unit 2026 (2202) confirming the presence of the input data in the same manner. Since the unit recognition section 2024 considers a slash (/), a dot (.), Etc. as part of a unit, the pointer P is then set to s skipping "/" in km / s (2206). Thereafter, s is found in the basic unit glossary 40 2026 confirming the presence of the entry data in the same manner (2202). As Input data is found for both k, m and s when searching the basic unit glossary 2026, as a result, it is judged that km / s is a character array expressing a unit. In this manner, in a case where input data is present in the basic unit glossary 2026 for all characters that are part of the recovery key character array, or in a case where data is present in the basic unit glossary 2026 for all characters except symbols such as oblique dashes, dots, etc., which are considered to be part of the unit, judging that the recovery key character array is a character array representing a unit.

Als de eenheidherkenningssectie 2024 klaar is met het zoeken in de fundamentele eenheid woordenlijst 2026 en geslaagd is in de herkenning van de samengestelde eenheid, dan wordt de op deze wijze verkregen een-15 heidsinformatie overgedragen naar de verwerkingssectie 2016 waarin ze wordt opgeslagen in de woorden!ijstinformatiebehoudtabel 2016a (2210).When the unit recognition section 2024 has finished searching the basic unit glossary 2026 and has successfully recognized the composite unit, the unit information thus obtained is transferred to the processing section 2016 in which it is stored in the words! Ice retention table 2016a (2210).

De eenheidherkenning is op deze wijze voltooid.Unit recognition is completed in this way.

Als er geen ingangsgegeven aanwezig is in de fundamentele eenheid woordenlijst 2026 als resultaat van het zoeken in de fundamentele een-20 heid woordenlijst 2026 op de karakterarray, die begint met het karakter waarbij de aanwijzer P is ingesteld, dan betekent dit in stap 2204 dat de karakterarray niet kan worden herkend als een fundamentele eenheid of als een samengestelde eenheid. De eenheidsherkenningssectie 2024 zendt derhalve de informatie, aangevende dat de karakterarray een niet 25 in de woordenlijst geregistreerd woord is, d.w.z. een informatie aangevend dat het woord niet staat voor een enheid, terug naar de verwerkingssectie 2016, waarin het wordt opgeslagen in de woorden!ijstinformatiebehoudtabel 2016a van de verwerkingssectie 2016, waarmee de een-heidsherkenning is voltooid.If no entry is present in the basic glossary unit 2026 as a result of searching the basic unity glossary 2026 on the character array starting with the character with the pointer P set, then in step 2204 it means that the character array cannot be recognized as a fundamental unit or as a composite unit. The unit recognition section 2024 therefore sends the information, indicating that the character array is a word not registered in the glossary, ie, indicating that the word does not represent a unit, back to the processing section 2016, where it is stored in the words list information retention table. 2016a of the processing section 2016, which completes the unit recognition.

30 Als, opnieuw verwijzend naar figuur 15, de eenheidherkenning (2112) is beëindigd dan keert het stroomschema terwug naar stap 2101 en wordt opnieuw een uitsnijding van een woorden!ijstreferentie-eenheid uitgevoerd door de verwerkingssectie 2016.Referring again to Figure 15, the unit recognition (2112) has ended, then the flowchart returns to step 2101 and again a word reference unit cutout is performed by the processing section 2016.

Na het uitsnijden van de woorden!ijstreferentie-eenheid beoordeelt 35 de verwerkingssectie 2016 of er nog steeds een uitgesneden eenheid aanwezig is of niet (2104) en, indien er geen uitgesneden eenheid, d.w.z. geen terugwinsleutelkarakterarray, meer aanwezig is dan wordt de informatie, opgeslagen in de woorden!ijstinformatiebehoudtabel 2016a, afgegeven aan de uitvoerinrichting door middel van de uitvoerkoppelschake-40 ling 2018 (2114). De analyse van de ingangszin is daarmee voltooid.After cutting out the words! Ice reference unit, the processing section 2016 judges whether a cut unit is still present or not (2104) and, if no cut unit, ie no recovery key character array, is present, the information is stored in the words ijstlist information retention table 2016a, issued to the output device by the output torque circuit 2018 (2114). The analysis of the input sentence has thus been completed.

8702359 ΐ f 318702359 ΐ f 31

Zoals in het bovenstaande is beschreven wordt volgens deze uitvoeringsvorm de Engelse ingangszin verdeeld in terugwinsleutelkarakterar-ray's en wordt allereerst gezocht in een gewone woordenlijst 2022, en indien er in de woordenlijst 2022 geen ingangsgegeven aanwezig is dan 5 wordt een eenheidsherkenning uitgevoerd. In de eenheidsherkenning worden de terugwinsleutelkarakterarray's verdeeld, geïndiceerd door de aanwijzer P, en voor elk van de deelkarakterarra's wordt gezocht in de fundamentele eenheid woordenlijst 2026. Diegenen, die vastgelegd zijn in de fundamentele eenheid woordenlijst 2026, of diegenen, die samenge-10 steld zijn uit een reeks van array's geregistreerd in de fundamentele eenheid woordenlijst 2026, worden dan beoordeeld als karakterarray's die staan voor eenheden.As described above, according to this embodiment, the English input phrase is divided into recovery key character arrays and a plain word list 2022 is first searched, and if there is no input data in the word list 2022 then unit recognition is performed. In the unit recognition, the recovery key character arrays are divided, indicated by the pointer P, and for each of the sub-character arrays, the basic unit of vocabulary 2026 is searched. Those that are defined in the basic unit of vocabulary 2026, or those that are compiled from a series of arrays registered in the basic unit glossary 2026, are then judged as character arrays representing units.

Omdat het derhalve mogelijk is om een eenheidherkenning uit te voeren zelfs voor een karakterarray, die een samengestelde eenheid tot 15 uitdrukking brengt door fundamentele eenheden, opgeborgen in de fundamentele eenheid woordenlijst 2026, te combineren kan de analyse worden uitgevoerd in overeenstemming met een variabele betekenis van de eenheden. Omdat het bovendien alleen nodig is dat de fundamentele eenheid woordenlijst 2026 fundamentele eenheden bevat, bijvoorbeeld k, m, s, 20 ...., enz., en samengestelde eenheden als combinaties ervan, bijvoorbeeld km, km/s, enz. niet behoeven te worden opgeslagen, kan de capaciteit van de woordenlijst worden gereduceerd.Therefore, since it is possible to perform a unit recognition even for a character array, which expresses a composite unit by combining fundamental units stored in the basic unit glossary 2026, the analysis can be performed in accordance with a variable meaning of the units. In addition, since it is only necessary that the glossary fundamental unit 2026 contains fundamental units, for example, k, m, s, 20 ...., etc., and do not need composite units as combinations thereof, for example km, km / s, etc. the capacity of the glossary can be reduced.

De derde uitvoeringsvorm van de onderhavige uitvinding zal nu worden verklaard.The third embodiment of the present invention will now be explained.

25 Figuur 18 illustreert de gehele structuur van de derde uitvoeringsvorm waarin de taal analyse inrichting volgens de onderhavige uitvinding wordt toegepast voor het automatisch vertalen van Engels naar Japans.Figure 18 illustrates the entire structure of the third embodiment in which the language analyzer according to the present invention is used for automatic translation from English to Japanese.

Deze uitvoeringsvorm omvat een invoersectie 3010 door middel waar-30 van een Engelse tekst 3012 die in het Japans moet worden vertaald, wordt ingevoerd. De invoersectie 3010 kan bijvoorbeeld voorzien zijn van een toetsenbord met karaktertoetsen zoals alfanumerieke toetsen of functietoetsen, een optische karakterlezer (OCR) voor het lezen van een op papier geregistreerde Engelse tekst en/of een bestandsgeheugenin-35 richting voor het lezen van Engelse tekst, geregistreerd op een geheu-genmedium, zoals een magnetisch schijfgeheugen.This embodiment includes an input section 3010 by means of which an English text 3012 to be translated into Japanese is input. Input section 3010 may include, for example, a keyboard with character keys such as alphanumeric keys or function keys, an optical character reader (OCR) for reading English text recorded on paper and / or a file memory device for reading English text, registered on a memory medium, such as a magnetic disk memory.

De vanaf de invoersectie 3010 ingevoerde Engelse tekst wordt ingelezen in een voorredigeersectie 3014 waarin een voorbehandeling voor de vertaling wordt uitgevoerd. In dit geval wordt in hoofdzaak zinsherken-40 ning en verwerking van onbekende woorden uitgevoerd. Dit functioneert 8702359 _* * 32 als deel van de morfologische analyse.The English text entered from the input section 3010 is read into a pre-editing section 3014 in which translation pretreatment is performed. In this case, sentence recognition and unknown word processing is mainly performed. This functions 8702359 * * 32 as part of the morphological analysis.

De Engelse gegevens worden na de voorredtgering overgedragen tezamen met informatie, verkregen tijdens de voorredigering, naar een morfologische analysesectie 3016. De sectie 3016 analyseert de morfemen 5 van de Engelse zin terwijl deze telkens met verwijzing naar de woordenlijst 3018 wordt verdeeld, voert verschillende klassifikaties uit zoals bewerkingen op onbekende woorden, eigennamen, tijdsuitdrukkingen, cijfers, enz. en voert bewerkingen uit op de gehele zin zoals het zoeken naar standaarduitdrukkingen en bijvoeglijke uitdrukkingen. De morfolo-10 gische analyseregels zijn opgeslagen in het analyseregelbestand 3036.The English data is transferred after the preregistration along with information obtained during the preregistration, to a morphological analysis section 3016. The section 3016 analyzes the morphemes 5 of the English sentence while each time it is distributed with reference to the glossary 3018, performs different classifications such as operations on unknown words, proper nouns, time expressions, numbers, etc. and performs operations on the entire sentence such as searching for standard expressions and adjectives. The morphological analysis rules are stored in the analysis rule file 3036.

De Engelse gegevens worden na de morfologische analyse tezamen met de uit de morfologische analyse verkregen woordenlijstinformatie overgedragen naar een analysesectie I 3020. De analysesectie I 3020 is een functionele sectie waarmee de oppervlaktelaagstructuur van de zin wordt 15 geanalyseerd door een grammaticale regel los te laten op de Engelse gegevens en de mogelijkheden in de structuur te onderzoeken.The English data is transferred after the morphological analysis along with the glossary information obtained from the morphological analysis to an analysis section I 3020. The analysis section I 3020 is a functional section that analyzes the surface layer structure of the sentence by releasing a grammatical rule on the Investigate English data and the possibilities in the structure.

De Engelse gegevens die onderworpen zijn geweest aan de analyse in de analysesectie I 3020 worden tezamen met de analyse-informatie overgedragen naar een analysesectie II 3022. In deze sectie wordt een op-20 lossing geselecteerd door het toepassen van een syntactische analyse gebaseerd op het resultaat van de analyse van de oppervlaktelaagstructuur door de structuurontleding I, waardoor een plausibele syntax-ana-lyse voor de Engelse zin wordt vervaardigd teneinde de structuur ervan te vormen. Deze analyseregels zijn ook opgeslagën in het analyseregel-25 bestand 3036.The English data that has been subject to the analysis in analysis section I 3020 is transferred together with the analysis information to an analysis section II 3022. In this section, a solution is selected by applying a syntactic analysis based on the result of the analysis of the surface layer structure by the structure decomposition I, thereby producing a plausible syntax analysis for the English sense to form its structure. These analysis lines are also stored in analysis line 25 file 3036.

De Engelse gegevens, die betrekking hebben op de analyse, worden syntactische analysegegevens overgedragen naar een structuurtransforma-tiesectie 3024. De structuurtransformatiesectie 3024 bereidt een syntactische analyse voor van een corresponderende Japanse zin uit de 30 structuur die een tussenstructuur is van de Engelse zin teneinde de Japanse zin te transformeren naar de onderliggende Japanse structuur van waaruit een Japanse zin gemakkelijk kan worden vertaald.The English data pertaining to the analysis, syntactic analysis data is transferred to a structure transformation section 3024. The structure transformation section 3024 prepares a syntactic analysis of a corresponding Japanese sentence from the structure which is an intermediate structure of the English sentence in order to provide the Japanese sentence to the underlying Japanese structure from which a Japanese sentence can be easily translated.

De op deze wijze getransformeerde syntaxgegevens die de Japanse onderliggende structuur aangeven, worden overgezonden naar een verta-35 ling vormende sectie 3026, waarin de vertaalde zin wordt gevormd. Dit is een functionele sectie voor het vormen van een Japanse zin uit de syntaxgegevens van de Japanse zin.The syntax data thus transformed indicating the Japanese underlying structure is transferred to a translation forming section 3026 in which the translated sentence is generated. This is a functional section for creating a Japanese sentence from the syntax data of the Japanese sentence.

De Japanse zingegevens, bewerkt als vertaalde zin, dat wil zeggen de vertaalde gegevens, worden overgedragen naar een naredigeersectie 40 3030. De naredigeersectie 3030 modificeert de vertaalde data door zoe- 8702359 1 k 33 ken in een woordenlijst 3018 gebruikmakend van informatie die ook werd gebruikt tijdens het vertaalproces teneinde een meer natuurlijke Japanse zin te voltooien. De gegevens voor de Japanse zin worden overgedragen naar een uitvoersectie 3032 en vandaar afgegeven als de vertaalde 5 Japanse zin 3034. De uitvoersectie 3032 bevat bijvoorbeeld een printer, een weergeefeenheid en/of een geheugeneenheid zoals een magnetisch schijfgeheugen.The Japanese sing data, edited as a translated sentence, i.e., the translated data, is transferred to a post-digest section 303030. The post-digest section 3030 modifies the translated data by searching 8702359 1 k 33 using a glossary 3018 using information that was also used during the translation process to complete a more natural Japanese sentence. The Japanese sentence data is transferred to an output section 3032 and hence output as the translated Japanese sentence 3034. The output section 3032 includes, for example, a printer, a display unit and / or a memory unit such as a magnetic disk memory.

Het stroomschema van een reeks van vertaal bewerki ngen wordt bestuurd door een stuursectie 3038 die de besturing van de gehele inrich-10 ting voor zijn rekening neemt.The flow chart of a series of translation operations is controlled by a control section 3038 which takes care of the control of the entire device.

De woordenlijst 3018 bevat woordenlijstgegevens voor Engelse en Japanse woorden in deze uitvoeringsvorm, waarin diverse informaties zijn ingeschreven zoals onderlinge samenhang, d.w.z. samenhangende relaties, betekenissen, meervoudige of enkelvoudige vorm, zinsdeel, enz. 15 naast de vocabulair. Verder bevat het analyseregelbestand 3036 regel gegevens voor de morfologische analyse en voor de syntactische analyse.The glossary 3018 contains glossary data for English and Japanese words in this embodiment, which includes various information such as interrelation, i.e., interrelationships, meanings, multiple or singular form, phrase, etc., in addition to the vocabulary. Furthermore, the analysis rule file 3036 contains line data for the morphological analysis and for the syntactic analysis.

De stuursectie 3038 is gekoppeld met een operatieweergeefsectie 3040. De operatieweergeefsectie 3040 bevat bijvoorbeeld bedieningstoet-sen zoals vertaalindicatietoetsen of cursortoetsen waarmee een opera-20 teur aan de onderhavige inrichting diverse instructies kan geven en een indicator die op visueel waarneembare wijze de ingevoerde Engelse tekst indiceert, evenals de Japanse tekst als resultaat van de vertaling en tussenliggende gegevens zoals de woordenlijstinformatie, en die verder diverse indicaties verschaft aan de operateur. Veel van deze operatie-25 indicatiefuncties kunnen zodanig worden uitgevoerd dat ze kunnen worden opgenomen in een toetsenbord dat aanwezig is in de invoersectie 3010 of op een weergeefeenheid die zich bevindt in de uitvoersectie 3032.The control section 3038 is coupled to an operation display section 3040. The operation display section 3040 includes, for example, operating keys such as translation indicator keys or cursor keys with which an operator can give various instructions to the present device and an indicator that visually observes the entered English text, as well as the Japanese text resulting from the translation and intermediate data such as the glossary information, and which further provides various indications to the operator. Many of these operation indication functions can be performed so that they can be included in a keyboard contained in the input section 3010 or on a display unit located in the output section 3032.

Verwezen wordt naar figuur 17 waarin een voorbeeld van een gedetailleerde structuur voor het verwerken van getallen in de morfologi-30 sche analysesectie 3016 is getoond. De sectie 3016 bevat natuurlijk nog andere functionele analysesecties, maar alleen die gedeelten, die direct van belang zijn voor een goed begrip van de onderhavige uitvinding zijn hier geïllustreerd. De morfologische analyse wordt uitgevoerd door geïnstrueerd te zoeken in een woordenlijst te beginnen vanaf het begin 35 van de ingangskarakterarray successievelijk in overeenstemming met te-rugwinsleutelkarakterarray's en door verwerking van de woordenlijstinformatie verkregen uit de woorden!ijstopzoeksectie 3104 in overeenstemming met ee numerieke vlag die nog nader zal worden beschreven.Referring to Figure 17, an example of a detailed structure for processing numbers in the morphological analysis section 3016 is shown. Section 3016, of course, includes other functional analysis sections, but only those sections that are directly relevant to an understanding of the present invention are illustrated here. The morphological analysis is performed by instructed to search a glossary starting from the beginning of the input character array successively in accordance with recovery key character arrays and by processing the glossary information obtained from the words list lookup section 3104 in accordance with a numerical flag which is further will be described.

De morfologische analysesectie 3016 bevat een invoerverwerkings-40 sectie 3100 voor het ontvangen en verwerken van gegevens uit de in- 8702359 34 gangskarakterarray van de voorverwerkingssectie 3014. De invoerverwer-klngssectle 3100 is voorzien van een ingangskarakterarraybuffer die aan de ingang de Engelse karakterarraygegevens ontvangt in de vorm van gecodeerde gegevens, bijvoorbeeld in ASCII code en die de karakterarray-5 gegevens tijde!ijk opslaat.The morphological analysis section 3016 includes an input processing 40 section 3100 for receiving and processing data from the input character array of the preprocessing section 3014. The input processing section 3100 includes an input character array buffer that receives the English character array data at the input in the input processing array. form of encoded data, for example in ASCII code and which stores the character array-5 data at a time.

De ingangskarakterarraygegevens, tijdelijk opgeslagen in de in-gangsverwerkingssectie 3100, worden toegezonden aan een eenheiduitsnij-dingssectie 3102 die de ingangskarakterarraygegevens indeelt in woor-denlijstterugwineenheden, zoals woorden. De eenheiduitsnijdingssectie 10 3102 is een functionele sectie waarmee onderscheid wordt gemaakt tussen de woordenlijstreferentie-eenheden die tezamen de terugwinsleutelkarak-terarray vormen voor het zoeken in de woordenlijst 3018 in de woorden-lijstterugwinsectie 3104. De naar de woordenlijst verwijzende begrenzer, die wordt gebruikt voor het uitsnijden van de woorden!ijstreferen-15 tie-eenheid, wordt geplaatst bij de positie van een Engels karakter, een numeriek karakter, een apostrof, een karakter anders dan een koppelteken en een rustteken, alsmede bij een aprostrof die volgt op een leeg karakter. Dit wordt opgeslagen in een begrenzertabel 3108 en als referentie gebruikt bij het uitsnijden van de woordenlijstreferentie-20 eenheid in de eenheiduitsnijdingssectie 3102.The input character array data, temporarily stored in the input processing section 3100, is sent to a unit excision section 3102 that divides the input character array data into word list recovery units, such as words. The unit excision section 10 3102 is a functional section that distinguishes between the word reference units that together form the recovery key character array for searching the glossary 3018 in the glossary recovery section 3104. The glossary referring term used for cutting out the words ijstreferen-15 unit, is placed at the position of an English character, a numeric character, an apostrophe, a character other than a hyphen and a rest sign, as well as at an aprostrop that follows an empty character. This is stored in a limiter table 3108 and used as a reference when cutting out the word reference reference unit in the unit cutout section 3102.

De woordenlijst 3018 bevat in het bijzonder informatie voor het verkrijgen van de uitsnijdingseenheden. Zoals met het voorbeeld van de invoerinformatie in figuur 24 is getoond zijn er voor elk ingangsgegeven in de woordenlijstgegevens opgeslagen, d.w.z. woorden, grammaticale 25 informatie zoals zinsdeel, alsmede een onderscheidende indicatie die aangeeft dat het woord een cijfer vertegenwoordigt, d.w.z. een numerieke vlag en numerieke waarde-informatie die betrekking heeft op de numerieke waarde waarvoor het betreffende woord staat.The glossary 3018 specifically contains information for obtaining the cutting units. As shown with the example of the input information in Figure 24, for each entry data, there is stored in the glossary data, ie words, grammatical information such as phrase, as well as a distinctive indication that the word represents a digit, ie a numeric flag and numeric value information related to the numerical value for which the word stands.

Zoals in de figuur is geïllustreerd worden zowel enkelvoudige als 30 meervoudige vormen naast elkaar beschreven voor elk van de ingangsgegevens van de woordenlijst 3018 en elk daarvan vormt een eigen ingangsgegeven.As illustrated in the figure, both single and multiple forms are described side by side for each of the input data of the glossary 3018 and each forms its own input data.

De numerieke vlag geeft aan dat het woord staat voor een getal indien voor de vlag een 'T‘ is ingesteld. Als verdere informatie wordt 35 bijvoorbeeld de telbaarheid en niet telbaarheid voor een werkwoord, identificatie voor overgankelijke of niet overgankelijke werkwoorden, vertaalde woorden, enz. geregistreerd. Verwijzend bijvoorbeeld naar "thousand" is, omdat dit een zelfstandig naarwoord is dat staat voor een getal, de numerieke vlag hiervoor ingesteld op "1" en is de nume-40 rieke waarde aangegeven als "1000". Omdat verder, met verwijzing naar 8702359 ï Jü 35 "thread", dit een zelfstandig naamwoord is maar geen zelfstandig naamwoord waarmee een getal wordt aangeduid, is de numerieke vlag ingesteld op "0".The numeric flag indicates that the word represents a number if the flag has a "T" set. As further information, for example, countability and non-countability for a verb, identification for transitive or non-transitive verbs, translated words, etc. are recorded. For example, referring to "thousand", because this is a standalone word representing a number, the numeric flag above is set to "1" and the numeric value is indicated as "1000". Furthermore, with reference to 8702359 ï Jü 35 "thread", this is a noun but not a noun indicating a number, the numeric flag is set to "0".

De herkenning van een getal wordt uitgevoerd door middel van de 5 numerieke vlag daarvan in het geval het woord is geregistreerd in de woordenlijst 3018, bijvoorbeeld als "one" of "thousand". Ook niet geregistreerde woorden, bijvoorbeeld een opeenvolging van numerieke karakters zoals "123", twee groepen van opeenvolgende getallen met een "rustteken" ertussen zoals een klein getal, bijvoorbeeld "10.2" en een 10 opeenvolging van numerieke karakters met een komma ertussen bijvoorbeeld "1,000,000" worden ook als getallen herkend. In de onderhavige beschrijving wordt de aanduiding "numeriek karakter" over het algemeen zodanig gebruikt dat daaronder niet alleen arabische cijfers worden begrepen maar ook de uitgespelde uitdrukking zoals "thirteen".The recognition of a number is performed by its 5 numeric flag in case the word is registered in the glossary 3018, for example, as "one" or "thousand". Also unregistered words, for example, a sequence of numeric characters such as "123", two groups of consecutive numbers with a "rest" character in between such as a small number, such as "10.2" and a 10 sequence of numeric characters with a comma in between, for example "1,000,000 "are also recognized as numbers. In the present description, the term "numeric character" is generally used to include not only Arabic numerals but also the spelled out expression such as "thirteen".

15 Zoals in figuur 28 is getoond bevat de woordenlijst 3018 een geldswaarde-eenheidtabel 3018a, waarin diverse symbolen voor geldswaarden zijn geregistreerd, een notatiesymbooltabel 3018b waarin notatie-symbolen "(spatie)" zijn geregistreerd en een decimale punttabel 3018c waarin de decimale punt "." enz. zijn geregistreerd. De ta-20 bellen voor de notatiesymbolen en decimale punten zijn aangebracht omdat wordt gebruikt als notatiesymbool of wordt gebruikt voor de decimale punt in het Japans of het Engels, waarbij de spatie of hoofdzakelijk wordt gebruikt als notatiesymbool en wordt gebruikt voor de decimale punt in andere Europese talen zoals Frans en Duits en 25 derhalve rekening moet worden gehouden met het gebruik van de symbolen in de verschillende talen.As shown in Figure 28, the glossary 3018 includes a monetary unit table 3018a in which various monetary symbols are registered, a notation symbol table 3018b in which notation symbols "(space)" are registered, and a decimal point table 3018c in which the decimal point ". " etc. are registered. The ta-20 bells for the notation symbols and decimal points are provided because it is used as the notation symbol or used for the decimal point in Japanese or English, the space being used mainly or as the notation symbol and used for the decimal point in other European languages such as French and German and 25 should therefore take into account the use of the symbols in the different languages.

De woordenlijstopzoeksectie 3104 is een functionele sectie die de woorden!ijstinformatie in stukken verdeelt door zoeken in de woordenlijst 3018 gebaseerd op de terugwinkarakterarray, ingevoerd vanaf de 30 eenheiduitsnijdingssectie 3102 en die deze overdraagt aan de verwer-kingssecties 3110, 3112, en 3116.The glossary lookup section 3104 is a functional section that divides the glossary information into pieces by searching the glossary 3018 based on the recovery character array input from the unit excision section 3102 and transmitting it to the processing sections 3110, 3112, and 3116.

De opeenvolgende rangschikking van numerieke karakters wordt uit-gevoerd door de volgende twee behandelingen. Als allereerst woorden worden herkend als cijfer, zoals in het bovenstaande is beschreven, en 35 bij het zoeken wordt de volgende woordenlijstreferentie-eenheid ook herkend als een cijfer, dan worden ze tezamen gerangschikt teneinde te worden gesynthetiseerd in een getal. Deze operatie wordt herhaald zolang er achtereenvolgende cijfers optreden. Bijvoorbeeld "30 thousand" wordt omgevormd tot "30000" en "1.5 million" in "1500000". Als daarna 40 de numerieke uitdrukking verder gaat met tussenvoeging van "and" dan 6702359 36 zullen alle cijfers links van "and" die corresponderen met cijfers van numerieke waarden aangeduid door de aanwijzer rechts van "and" gelijk aan "0" zijn rekening houdend met de betekenis van de numerieke uitdrukking en worden gesynthetiseerd tot een getal. Bijvoorbeeld "one 5 hundred and thirty" wordt gesynthetiseerd in "130", terwijl "30 thousand and two hundred" wordt gesynthetiseerd in "30200".The sequential arrangement of numerical characters is performed by the following two treatments. First, if words are recognized as a digit, as described above, and in the search, the next word reference unit is also recognized as a digit, then they are arranged together to be synthesized into a number. This operation is repeated as long as consecutive digits occur. For example, "30 thousand" is converted to "30000" and "1.5 million" into "1500000". If after 40 the numeric expression continues with insertion of "and" then 6702359 36 all digits to the left of "and" corresponding to digits of numerical values indicated by the pointer to the right of "and" will be equal to "0" taking into account the meaning of the numeric expression and are synthesized into a number. For example, "one 5 hundred and thirty" is synthesized in "130", while "30 thousand and two hundred" is synthesized in "30200".

Na een dergelijke herkenning van een cijfer wordt de verder benodigde lokale analyse uitgevoerd. In dit proces wordt een reeks van ana-lyse-eenheden geactueerd door de analyse-actuatie-informatie voor elk 10 van de analyse-eenheden, collectief gerangschikt in een enkele analyse-eenheid gebaseerd op een lokale analyseregel. Een geldswaardesymbool en een numerieke waarde, bijvoorbeeld "¥1,000" worden tezamen gerangschikt als "1000 yen", en een numerieke waarde met een eenheid "1.5 km" wordt collectief samengevoegd tot 1.5 kilometer".After such recognition of a figure, the further required local analysis is performed. In this process, a series of analysis units is actuated by the analysis actuation information for each of the 10 analysis units, collectively arranged in a single analysis unit based on a local analysis rule. A money value symbol and a numeric value, for example, "¥ 1,000" are arranged together as "1000 yen", and a numeric value with a unit of "1.5 km" is aggregated together to 1.5 kilometers ".

15 Deze rangschikkingen worden uitgevoerd in de verwerkingssecties 3110-3122. De verwerkingssectie 3110 is een functionele sectie voor het collectief rangschikken van een getal tezamen met een geldswaardesymbool of een eenheid. De verwerkingssectie 3112 is een functionele sectie waarin de getalsvorming van het getal wordt uitgevoerd. Verder is 20 de verwerkingssectie 3114 een functionele sectie voor het verwerken van getallen die met elkaar verbonden zijn via een koppelteken. Verder is de verwerkingssectie 3116 een functionele sectie voor het verwerken van opeenvolgende numerieke karakters.These rankings are performed in processing sections 3110-3122. The processing section 3110 is a functional section for collectively arranging a number together with a money value symbol or a unit. The processing section 3112 is a functional section in which the numbering of the number is performed. Furthermore, the processing section 3114 is a functional section for processing numbers connected by a hyphen. Furthermore, the processing section 3116 is a functional section for processing subsequent numeric characters.

Met verwijzing naar het getal na de rangschikking met een gelds-25 waardesymbool of een eenheid wordt de combinatie van het geldswaardesymbool en de numerieke waarde tot een enkel zelfstandig naamwoord uitgevoerd in de verwerkingssectie 3118. Verder wordt de combinatie van de eenheid en de numerieke waarde tot een enkel zelfstandig naamwoord uitgevoerd in de verwerkingssectie 3120. In het geval van een getal dat is 30 onderworpen aan een getalsvormingsbewerking, een getal met een koppelteken en een continu doorlopend getal wordt verder de verwerking voor het combineren met de voorafgaande numerieke waarde uitgevoerd in de verwerkingssectie 3122. De woorden!ijstinformatie voor de ingevoerde karakterarray wordt na voltooiing van een dergelijke bewerking, opge-35 slagen in de gesorteerde informatiewoordenlijstbuffer, dat wil zeggen de woorden!ijstinformatiebehoudtabel 3124.With reference to the number after the rank with a money-25 value symbol or a unit, the combination of the money-value symbol and the numeric value into a single noun is performed in the processing section 3118. Furthermore, the combination of the unit and the numeric value into a single noun executed in the processing section 3120. In the case of a number subjected to a numbering operation, a hyphenated number and a continuous continuous number, further processing for combining with the previous numeric value is performed in the processing section 3122. The words list information for the input character array is stored after completion of such an operation in the sorted information word list buffer, that is, the words list information retention table 3124.

De resultaten van de morfologische analyse worden overgedragen vanaf de woordenlijstinformatiebehoudtabel 3124 naar de analysesectie I 3020.The results of the morphological analysis are transferred from the glossary information retention table 3124 to the analysis section I 3020.

40 De verwerking door de numerieke vlag wordt uitgevoerd in de volg- 8702559 i « 37 orde als getoond in de figuren 19A en 19B. De gegevens voor de ingangs-karakterarray worden ontvangen in de invoerverwerkingssectie 3100 waar de invoerverwerking wordt uitgevoerd (3200). Daarna snijdt de eenheid-uitsnijdingssectie 3102 de ingangskarakterarray in de woorden!ijstrefe-5 rentie-eenheden waarmme in de woordenlijst 3018 moet worden gezocht (3201). De woorden!ijstterugwinsectie 3104 sorteert de woordenlijst 3018 in overeenstemming daarmee (3203) en indien er een woorden!ijstingangsgegeven aanwezig is (3204) dan wordt de numerieke vlag onderzocht (3205). Als de numerieke vlag niet is ingesteld omdat het woord geen 10 getal aangeeft, dan wordt de woorden!ijstinformatie geaccumuleerd in de woordenlijstinformatiebehoudtabel 3124. Als een "1" is ingesteld voor de numerieke vlag dan wordt het getal in getalsvorm gebracht in de verwerk! ngssectie 3112 (3206) en de samenstelling wordt gevormd (3207) met de voorafgaande numerieke waarde in de verwerkingssectie 3122. Als deze 15 bewerkingen worden uitgevoerd op het laatste gedeelte van een zin, aangegeven door de ingangskarakterarraydata (3202), dan wordt de samengestelde rangschikking (3209) met het geldswaardesymbool of de eenheid uitgevoerd in de verwerkingssecties 3118, 3120, en het resultaat van de morfologische analyse wordt toegevoerd aan de analysesectie I 3020 20 (3210).The processing by the numerical flag is performed in the following order as shown in Figures 19A and 19B. The data for the input character array is received in the input processing section 3100 where the input processing is performed (3200). Thereafter, the unit excision section 3102 intersects the input character array into the words ice reference units to be searched in the glossary 3018 (3201). The word ice recovery section 3104 sorts the word list 3018 accordingly (3203) and if a word ice entry entry is present (3204) then the numeric flag is examined (3205). If the numeric flag is not set because the word does not indicate a 10 number, then the words! List information is accumulated in the glossary information retention table 3124. If a "1" is set for the numeric flag, the number is numbered in the processing! ngs section 3112 (3206) and the composition is formed (3207) with the preceding numerical value in the processing section 3122. If these operations are performed on the last part of a sentence, indicated by the input character array data (3202), then the composite arrangement is (3209) with the money value symbol or unit output in processing sections 3118, 3120, and the result of the morphological analysis is fed to the analysis section I 3020 (3210).

Als er bij het zoeken in de woordenlijst geen ingangsgegeven wordt gevonden in stap 3204 en indien het element is voorzien van een koppelteken (3212) dan wordt de verwerking voor een getal voorzien van een koppelteken (3213) uitgevoerd in de verwerkingssectie 3114. Als het 25 eerste karakter niet een karakter met een koppelteken is maar een geldswaardesymbool (3214) dan wordt alleen het geldswaardesymbool opgenomen in de woorden!ijstinformatiebehoudtabel 3124 (3216) en het geldswaardesymbool wordt weggelaten (3217) uit de woorden!ijstverwijzings-eenheid. Als het eerste karakter geen geldswaardesymbool is (3214) dan 30 wordt de verwerking voor de opeenvolgende numerieke karakters 3215 uitgevoerd in de verwerkingssectie 3116. Deze operatie wordt uitgevoerd tot aan de laatste positie (3202).If no entry is found in the glossary in step 3204 and if the element is hyphenated (3212), then the processing for a hyphenated number (3213) is performed in the processing section 3114. If it is first character is not a hyphen character but a money value symbol (3214) then only the money value symbol is included in the words! list information retention table 3124 (3216) and the money value symbol is omitted (3217) from the words! list reference unit. If the first character is not a money value symbol (3214) then 30, the processing for the consecutive numeric characters 3215 is performed in the processing section 3116. This operation is performed up to the last position (3202).

De samenstelling 3209 met het geldswaardesymbool en de eenheid wordt uitgevoerd in de verwerkingssectie 3110 door het verwerkings-35 stroomschema dat getoond is in figuur 20. Allereerst wordt in de initiële verwerking 3220 de beginaanwijzer voor de verwerking ingesteld bij het begin van de buffer. Als het element, aangeduid door de aanwijzer geen numerieke waarde is (3221) dan wordt de aanwijzer stapsgewijze voorwaarts verplaatst (3226). Indien het karakter een numerieke waarde 40 heeft, maar het heeft geen voorafgaand geldswaardesymbool en geen op- 8702358 V * 38 volgende eenheid dan wordt de aanwijzer ook stapsgewijze voorwaarts verplaatst (3222, 3224). De verwerking wordt verder uitgevoerd tot aan de laatste positie van de woordenlijstverwijzingseenheid (3227).The money value symbol assembly 3209 and the unit is outputted in the processing section 3110 by the processing flow chart shown in Fig. 20. First, in the initial processing 3220, the processing start pointer is set at the beginning of the buffer. If the element indicated by the pointer is not a numerical value (3221), the pointer is moved forward in steps (3226). If the character has a numerical value 40, but it has no preceding money value symbol and no next unit, the pointer is also moved forward step by step (3222, 3224). Processing is continued up to the last position of the glossary reference unit (3227).

Als het karakter een numerieke waarde is (3222), dan worden het 5 geldswaardesymbool en de numerieke waarde tezamen gerangschikt in een enkel zelfstandig naamwoord (3223). Het geldswaardesymbool en het numerieke karakter, bijvoorbeeld "¥1,000" worden in een enkel zelfstandig naamwoord gecombineerd. Als verder het voorafgaande karakter geen geldswaardesymbool is en het opeenvolgende karakter is een eenheid, dan 10 worden de numerieke waarde en de eenheid tezamen gerangschikt in een enkel zelfstandig naamwoord (3225). Een numeriek karakter en een eenheid als bijvoorbeeld "1.5 km" worden tezamen in een enkel zelfstandig naamwoord gecombineerd. De verwerking wordt uitgevoerd tot aan de laatste positie van de woordenlijstterugwineenheid (3227).If the character is a numeric value (3222), then the 5 money value symbol and the numeric value are arranged together in a single noun (3223). The money value symbol and numeric character, for example "¥ 1,000" are combined in a single noun. Furthermore, if the preceding character is not a money value symbol and the successive character is a unit, then the numerical value and the unit are arranged together in a single noun (3225). A numeric character and a unit such as "1.5 km", for example, are combined into a single noun. Processing is performed up to the last position of the glossary recovery unit (3227).

15 De verwerking voor getallen die voorzien zijn van een koppelteken wordt uitgevoerd in de verwerkingssectie 3114 door het verwerkings-stroomschema dat getoond is in de figuren 21A en 21B. Allereerst wordt de van een koppelteken voorziene woordenlijstreferentie-eenheid opgeslagen in de buffer tijdens de initiële voorbewerking 3230. Verder 20 wordt de numerieke waarde "0" vastgehouden en wordt het koppelteken in de oorspronkelijke woorden!ijstverwijzingseenheid veranderd in een spatie.The processing for hyphenated numbers is performed in the processing section 3114 by the processing flow chart shown in Figures 21A and 21B. First, the hyphenated word reference unit is stored in the buffer during the initial preprocessing 3230. Furthermore, the numeric value "0" is held and the hyphen in the original word reference unit is changed to a space.

Daarna wordt de woorden!ijstverwijzingseenheid uitgesneden (3231) om te zoeken in de woordenlijst (3235). Wordt er als resultaat van het 25 zoekproces in de woordenlijst geen ingangsgegeven gevonden, dat wil zeggen is het woord niet geregistreerd in de woordenlijst (3236) dan wordt de gehele woordenlijstverwijzingseenheid vastgehouden als een niet in de woordenlijst geregistreerd woord in de woordenlijstinforma-tiebehoudtabel 3124 (3237).Then the words list reference unit is cut out (3231) to search the glossary (3235). If no entry is found as a result of the glossary search process, ie the word is not registered in the glossary (3236), the entire glossary reference unit is held as a word not registered in the glossary in the glossary information retention table 3124 ( 3237).

30 Indien als resultaat van het zoekproces in de woordenlijst een ingangsgegeven wordt verkregen (3236) dan wordt dit onderzocht op het feit of de numerieke vlag daarvan een "1" is of niet. Als de numerieke vlag een "1" is, dan betekent dit dat het karakter geen numeriek karakter is en de gehele woordenlijstverwijzingseenheid wordt behouden als 35 een niet in de woordenlijst geregistreerd woord in de woordenlijstin-formatiebehoudtabel 3124 (3237).If an entry is obtained as a result of the glossary search (3236), it is examined whether its numeric flag is a "1" or not. If the numeric flag is a "1", it means that the character is not a numeric character and the entire glossary reference unit is retained as a word not registered in the glossary in the glossary information retention table 3124 (3237).

Indien een "1" is ingesteld in de numerieke vlag voor het woordenlijst!' ngangsgegeven dan wordt het getal tot een numerieke uitdrukking omgewerkt in de verwerkingssectie 3012 gebaseerd op het iniangsgegeven 40 (3239). Vervolgens wordt de tot een getalswaarde omgevormde numerieke 8702359 ( ii 39 waarde toegevoegd aan de numerieke waarde die op dat moment is opgeslagen (3240) en het resultaat van de samenvoeging wordt vastgehouden (3241). Bijvoorbeeld "two" in "twenty-two" wordt toegevoegd aan de "twenty" in de voorafgaande "3020" tot "3022". De verwerking wordt uit-5 gevoerd tot aan de laatste positie van de woorden!ijstverwijzingseen-heid (3232).If a "1" is set in the numeric flag for the glossary! " Then, the number is converted into a numeric expression in the processing section 3012 based on the input data 40 (3239). Then, the numeric converted 8702359 (ii 39) into a numerical value is added to the currently stored numeric value (3240) and the result of the concatenation is held (3241). For example, "two" in "twenty-two" is added to the "twenty" in the foregoing "3020" to "3022" The processing is carried out up to the last position of the words list reference unit (3232).

Als in dit stapsgewijze proces de laatste positie is bereikt dan gaat het stroomschema verder naar de verwerking 3233 in stap 3232, en de behouden getalwaarde wordt als een numerieke waarde in de plaats ge-10 steld van de gehele van een koppelteken voorziene woordenlijstverwij-zingseenheid. Daarna wordt de samenstelling 3207 voor de numerieke waarde met de voorafgaande numerieke waarde uitgevoerd.When the last position is reached in this stepwise process, the flowchart proceeds to the processing 3233 in step 3232, and the retained numerical value is substituted as a numeric value for the entire hyphenated reference unit. Thereafter, the numerical value compound 3207 with the preceding numerical value is output.

Nu zal een verklaring wordne gegeven van de verwerking van opeenvolgende numerieke karakters 3215 hetgeen uitgevoerd wordt in de ver-15 werkingssectie 3116 met verwijzing naar de figuren 22A en 22B. In deze stroomschema's staat het symbool "=" voor een substitutie. Allereerst wordt de initialisatie 3250 uitgevoerd waarin de vastgehouden numerieke waarde val-save wordt ingesteld op "0", de parameter "i" op "1" en de aanwijzer wordt ingesteld op het begin van de karakterarray van de 20 woordenli j stverwi jzi ngseenhei d.An explanation will now be given of the processing of consecutive numerical characters 3215 which is performed in processing section 3116 with reference to Figures 22A and 22B. In these flow charts, the symbol "=" represents a substitution. First, the initialization 3250 is performed in which the retained numeric value val-save is set to "0", the parameter "i" to "1", and the pointer is set to the beginning of the character array of the 20 word reference unit .

Daarna wordt gecontroleerd of het karakter *p, aangewezen door de aanwijzer p, een numeriek karakter (3251), een notatiekarakter (3252) of een decimale punt (3253) is en indien het geen van deze is, dan wordt de gehele karakterarray opgeborgen als een niet in de woorden-25 lijst geregistreerd woord in de woordenlijstinformatiebehoudtabel 3124 (3255). Als het een decimale punt is (3253) dan wordt de parameter "i" vermenigvuldigd met 10 (3254) en wordt stap 3258 uitgevoerd. In stap 3258 wordt de numerieke waarde num (*p) voor het karakter *p opgeteld bij de vastgehouden numerieke waarde val-save teneinde een nieuwe vast 30 te houden numerieke waarde te verkrijgen. De numerieke waarde num (*p) is een waarde die betrekking heeft op het karakter (*p) als numerieke waarde.Then it checks whether the character * p, indicated by the pointer p, is a numeric character (3251), a notation character (3252) or a decimal point (3253) and if none of these, the entire character array is stored as a word not registered in the word list in the word list information retention table 3124 (3255). If it is a decimal point (3253) then the parameter "i" is multiplied by 10 (3254) and step 3258 is performed. In step 3258, the numeric value num (* p) for the character * p is added to the retained numeric value val-save to obtain a new numeric value to be retained. The numeric value num (* p) is a value that refers to the character (* p) as a numeric value.

Na stap 3251 of 3252 wordt, indien het karakter een numeriek karakter of een notatiekarakter is, de stap 3257 uitgevoerd. In stap 3257 35 wordt de vastgehouden numerieke waarde val-save vermenigvuldigd met 10, waarbij de numerieke waarde num (*p) voor het karakter *p wordt opgeteld om een nieuwe vast te houden numerieke waarde te verkrijgen.After step 3251 or 3252, if the character is a numeric character or a notation character, step 3257 is performed. In step 3257 35, the retained numeric value val-save is multiplied by 10, adding the numeric value num (* p) for the character * p to obtain a new numeric value to be retained.

Na deze bewerkingen wordt de aanwijzer stapsgewijze voorwaarts bewogen (3259) en de bewerking wordt herhaald tot aan de laatste positie 40 van de woordenlijstverwijzingseenheid (3260). Als dit de laatste posi- 8702358 40 tie van de karakterarray is, dan is de numerieke waarde voor de gehele karakterarray omgevormd tot de vastgehouden numerieke waarde (3261), en de samengestelde rangschikking 3207 met de voorafgaande numerieke waarde wordt verkregen in de verwerkingssectie 3122. Door de verwerking 5 worden opeenvolgende numerieke karakters zoals bijvoorbeeld "1,000.5" geanalyseerd als een numerieke waarde "1000.5".After these operations, the pointer is stepped forward (3259) and the operation is repeated up to the last position 40 of the glossary reference unit (3260). If this is the last position of the character array, then the numeric value for the entire character array is converted to the retained numeric value (3261), and the composite arrangement 3207 with the previous numeric value is obtained in the processing section 3122. By processing 5, successive numeric characters such as, for example, "1,000.5" are analyzed as a numerical value "1000.5".

De samengestelde rangschikking 3207 met de voorafgaande waarde wordt als volgt in de verwerkingssectie 3122 verkregen. Allereerst wordt de aanwijzer van de woorden!ijsttabel ingesteld op een vooraf-10 gaande positie van de woorden!ijstverwijzingseenheid (3270). Als er in deze positie niets staat, dan betekent dit dat de eerste positie in de behoudtabel de numerieke waarde aangeeft en vervolgens wordt de numerieke waarde voor de lopende woorden!ijstverwijzingseenheid geregistreerd in de woordenlijstbehoudtabel 3124 (3284). De registratieposi-15 tie is de positie die volgt op de positie die op dit moment wordt aangewezen door de aanwijzer P.The composite arrangement 3207 with the preceding value is obtained in the processing section 3122 as follows. First, the pointer of the words ice table is set to a previous position of the words ice reference unit (3270). If there is nothing in this position, it means that the first position in the retention table indicates the numeric value, and then the numeric value for the running words! List reference unit is registered in the glossary retention table 3124 (3284). The recording position is the position that follows the position currently designated by the pointer P.

Als een woord aanwezig is in de voorafgaande positie in stap 3271 en als het ingangsgegeven aangewezen door de aanwijzer p niet "and" is (3272) en de aanwijzer p niet op een numerieke waarde (3273) wijst dan 20 wordt de numerieke waarde voor de momentane woorden!ijstverwijzingseenheid geregistreerd naast de op dit moment door de aanwijzer p aangewezen positie in de woordenlijstbehoudtabel 3124 (3284). In het voorbeeld van "To him two...." wordt bijvoorbeeld "two" nu geregistreerd als een numerieke waarde "2".If a word is present in the preceding position in step 3271, and if the entry indicated by the pointer p is not "and" (3272) and the pointer p does not indicate a numerical value (3273), then the numerical value for the instantaneous words list reference unit registered next to the position currently designated by the pointer p in the glossary retention table 3124 (3284). For example, in the example of "To him two ....", "two" is now registered as a numeric value "2".

25 Als in stap 3273 de aanwijzer wijst naar een numerieke waarde dan wordt de numerieke waarde p-5>v voor het ingangsgegeven, aangewezen door de aanwijzer p, vermenigvuldigd met de numerieke waarde v- voor de momentane woorden!ijstverwijzingseenheid teneinde een nieuwe numerieke waarde p^v te vormen voor het ingangsgegeven aangeduid door de aanwij-30 zer p (3274). In het geval van "two thousand" bijvoorbeeld wordt "2x1000=2000“ uitgevoerd teneinde de gehele "two thousand" in een enkel gegeven te rangschikken. Daarna wordt de eindpositie voor de momentane woorden!ijstverwijzingseenheid ingesteld op de eindpositie voor het ingangsgegeven van de aanwijzer p, dat wil zeggen de p-eindpositie 35 (3282).If, in step 3273, the pointer points to a numeric value, then the numeric value p-5> v for the input data, designated by the pointer p, is multiplied by the numeric value v- for the current words! List reference unit to obtain a new numeric value p ^ v for the input data indicated by the pointer p (3274). For example, in the case of "two thousand", "2x1000 = 2000" is output to arrange the whole "two thousand" in a single data, then the end position for the instant words list reference unit is set to the end position for the pointer input data p, i.e. the p-end position 35 (3282).

Als in stap 3272 het ingangsgegeven, aangeduid door de aanwijzer p, gelijk is aan "and", dan wordt de aanwijzer p overgedragen naar de voorafgaande woordenlijstverwijzingseenheid (3275). Indien dit niet de laatste positie is (3276) en indien het gaat om een numerieke waarde 40 (3277), dan wordt de numerieke waarde v-nu van de momentane woorden- 8702358 41 c 4t lijstverwijzingseenheid overgedragen en afgerond op het meest significante cijfer, dat ingesteld is op een waarde vl. Als de numerieke waarde v-nu van de momentane woordenlijstverwijzingseenheid bijvoorbeeld gelijk is aan "8", "8.1", "98" of "11", dan wordt de waarde van vl 5 resp. gelijk aan "10", "10", "100" of "100".If, in step 3272, the input indicated by the pointer p is equal to "and", the pointer p is transferred to the preceding glossary reference unit (3275). If this is not the last position (3276) and if it is a numerical value 40 (3277), then the numerical value v-nu of the current words- 8702358 41 c 4t list reference unit is transferred and rounded to the most significant digit, which is set to a value vl. For example, if the numerical value v-nu of the current glossary reference unit is "8", "8.1", "98" or "11", then the value of v1 resp. equal to "10", "10", "100" or "100".

Daarna wordt gecontroleerd of het restant, verkregen door het delen van de numerieke waarde p-v voor het ingangsgegeven, aangeduid door de aanwijzer p, door vl, dat wil zeggen mod (p-v, vl) al dan niet gelijk is aan "0". Indien het niet gelijk is aan "0", dan wordt de aan-10 wijzer p opgehoogd (3283), en wordt de numerieke waarde voor de momentane woordenlijstverwijzingseenheid geregistreerd op een positie nasst de positie aangeduid door de momentane aanwijzer p in de woordenlijst-behoudtabel 3124 (3284). In het geval van "I and two" bijvoorbeeld is "two" nu geregistreerd als een numerieke waarde "2".Then, it is checked whether the remainder obtained by dividing the numerical value p-v for the input data, indicated by the pointer p, by v1, i.e. mod (p-v, v1) is or is not "0". If it is not equal to "0", the pointer -10 pointer is incremented (3283), and the numerical value for the current glossary reference unit is registered at a position next to the position indicated by the current pointer p in the glossary retention table 3124 (3284). For example, in the case of "I and two", "two" is now registered as a numeric value "2".

15 Als het restant gelijk is aan "0" in stap 3279 dan wordt de numerieke waarde v-nu voor de momentane woordenlijstverwijzingseenheid toegevoegd aan de numerieke waarde p-^v voor het ingangsgegeven, aangewezen door de aanwijzer p, teneinde een nieuwe numerieke waarde p-^v te vormen voor het ingangsgegeven aangeduid door de aanwijzer p (3280). In 20 het geval van "two thousand and two" bijvoorbeeld was in dit stadium "two thousand" al reeds gerangschikt in "2000". Daarna wordt voor "two" de waarde "2" opgeteld door optelling 3200 teneinde uiteindelijk het gehele gedeelte te verkrijgen in "2002". Vervolgens wordt de informatie "and", aangewezen door de aanwijzer p+1, verwijderd uit de informatie-25 behoudtabel 3124 (3281) en gaat het stroomschema verder met stap 3282.If the remainder is "0" in step 3279, then the numeric value v-nu for the current glossary reference unit is added to the numeric value p-^ v for the input data, designated by the pointer p, to create a new numeric value p - ^ v for the input data indicated by the pointer p (3280). In the case of "two thousand and two", for example, at this stage "two thousand" was already ranked in "2000". Then, for "two", the value "2" is added by adding 3200 to finally obtain the whole portion in "2002". Then, the information "and", designated by the pointer p + 1, is removed from the information retention table 3124 (3281) and the flowchart proceeds to step 3282.

Aan de hand van een voorbeeld zal een verklaring worden gegeven. Als het opstellen van de woordenlijst bijvoorbeeld wordt uitgevoerd voor de ingangskarakterarray "To him two thousand and twenty-two—" zoals getoond is in figuur 25 dan wordt de woordenlijstingangsgegevens-30 informatie ingeschreven in de woordenlijstinformatiebehoudtabel 3124 als getoond in figuur 26A. Voor "him" bijvoorbeeld is de beginpositie "4", de eindpositie is "6" en het zinsdeel is een voornaamwoord. Bij de numerieke verwerking wordt allereerst voor "two" beoordeeld dat de numerieke vlag gelijk is aan "1" (3205) en de numerieke waarde daarvoor 35 is gelijk aan "2". Omdat het aan "two" voorafgaande karakter in deze karakterarray geen numerieke waarde is wordt deze direct opgeslagen in de tabel 3124 (3206, 3208, 3284).An explanation will be given on the basis of an example. For example, if the glossary is performed for the input character array "To him two thousand and twenty-two—" as shown in Figure 25, then the glossary entry data-30 information is written into the glossary information retention table 3124 as shown in Figure 26A. For example, for "him", the starting position is "4", the ending position is "6", and the phrase is a pronoun. In numerical processing, it is first judged for "two" that the numeric flag is equal to "1" (3205) and the numerical value therefor is equal to "2". Since the character preceding "two" in this character array is not a numeric value, it is stored directly in table 3124 (3206, 3208, 3284).

Daarna wordt de aanwijzer opgehoogd en gaat verder met het verwerken van "thousand". De numerieke vlag is "1" en de numerieke waarde is 40 "1000" (3205, 3206). Omdat bovendien de numerieke waarde van het voor- 8702359 s * 42 afgaande karakter gelijk is aan "2" (3207, 3273), wordt de vermenigvuldiging: 2x1000 uitgevoerd (3274), en het resultaat daarvan wordt opgeslagen in de tabel 3124 (zie figuur 26B). Voor het volgende "and" wordt de woordenlijstinformatie tijdelijk geaccumuleerd zoals ze is in de ta-5 bel 3124 (zie figuur 26C).Then the pointer is raised and continues to process "thousand". The numeric flag is "1" and the numeric value is 40 "1000" (3205, 3206). In addition, because the numeric value of the preceding 8702359 s * 42 character is "2" (3207, 3273), the multiplication: 2x1000 is performed (3274), and the result is stored in table 3124 (see figure 26B). For the next "and", the glossary information is temporarily accumulated as it is in Table 3124 (see Figure 26C).

De aanwijzer wordt verder bewogen voor het verwerken van "twenty-two". Omdat deze door middel van een koppelteken met elkaar gekoppelde woorden niet worden aangetroffen in de woorden!ijstingangsgegevens (3212), wordt "20x2=22" uitgevoerd door de daarvoor bestemde verwer-10 kingseenheid 3213 (3237, 3239-3241). Omdat het voorafgaande woord gelijk is aan "and" (3272) en de daaraan voorafgaande numerieke waarde gelijk is aan "2000" (3277), wordt de numereieke waarde "22" afgerond op het meest significante cijfer in "100" (3278) en wordt een deeloperatie (3279) uitgevoerd. Omdat het restant gelijk is aan "0", wordt de 15 optelling 3280 tussen "2000" en "22" uitgevoerd. De informatie voor "and" wordt geelimineerd (3282) uit de behoudtabel 3124 en het resultaat van de optelling "2022" wordt vastgehouden als numerieke waarde in de tabel 3124, waardoor "two thousand and twenty-two" wordt herkend als "2022". De samengestelde rangschikking van de voorafgaande numerieke 20 waarde is op deze wijze in 3207 uitgevoerd.The pointer is moved further to process "twenty-two". Because these hyphenated words are not found in the words ice entry data (3212), "20x2 = 22" is output by the appropriate processing unit 3213 (3237, 3239-3241). Since the preceding word is "and" (3272) and the preceding numeric value is "2000" (3277), the numeric value "22" is rounded to the most significant digit in "100" (3278) and a partial operation (3279) is performed. Since the remainder is "0", the addition 3280 between "2000" and "22" is performed. The information for "and" is eliminated (3282) from the retention table 3124 and the result of the addition "2022" is held as a numeric value in the table 3124, recognizing "two thousand and twenty-two" as "2022". The composite arrangement of the previous numerical value has been done in 3207 in this manner.

Er zal een ander voorbeeld worden getoond. Zoals te zien is in figuur 27 wordt de ontleding uitgevoerd voor de ingangskarakterarray "You said $1,000.5 thousand was —". ".$1,000.5" is niet geregistreerd in de woordenlijst 3018. Het eerste karakter is een geldswaardesymbool 25 dat herkend kan worden als gel dswaardesymbool uit de woorden! ijst-ingangsgegevens. Dit wordt onafhankelijk geregistreerd in de behoudtabel 3124 (3214, 3216, figuur 29A).Another example will be shown. As shown in Figure 27, the decomposition is performed for the input character array "You said was $ 1,000.5 thousand -". ". $ 1,000.5" is not registered in the glossary 3018. The first character is a money value symbol 25 which can be recognized as a value value symbol from the words! ice input data. This is independently recorded in the retention table 3124 (3214, 3216, Figure 29A).

Daarna wordt "1,000.5" omgevormd tot een numerieke waarde "1000.5" door achtereenvolgende numerieke karakterverwerking 3215. Omdat het 30 voorafgaande karakter het symbool is en geen numerieke waarde wordt de numerieke waarde geregistreerd zoals ze is (3280-3273, 3284, figuur 29B).Thereafter, "1,000.5" is converted to a numeric value "1000.5" by sequential numeric character processing 3215. Since the preceding character is the symbol and no numeric value, the numeric value is recorded as it is (3280-3273, 3284, Figure 29B).

Het volgende woord "thousand" is een getal en de numerieke waarde ervan is "1000". Omdat het voorafgaande karakter een numerieke waarde 35 is (3272, 3273) wordt een berekening uitgevoerd: "1000.5x1000=1000500" (3274) (figuur 29C).The next word "thousand" is a number and its numerical value is "1000". Since the preceding character is a numeric value 35 (3272, 3273), a calculation is performed: "1000.5x1000 = 1000500" (3274) (Figure 29C).

Nadat de woorden!ijstterugwinning op deze wijze is beëindigd wordt de in de woorden!ijstinformatiebehoudtabel 3174 vastgehouden inhoud achtereenvolgens onderzocht. Omdat het geldswaardesymbool aanwezig 40 is direct voorafgaand aan de numerieke waarde "1000500" worden beiden 8702353 43After the words ice recovery has ended in this manner, the content retained in the words ice information retention table 3174 is sequentially examined. Because the money value symbol is present 40 immediately prior to the numerical value "1000500" both 8702353 43

« -V-V

t tezamen gerangschikt en wordt "41000500" gevormd als een enkel zelfstandig ingangsgegeven (3209, 3221-3223: figuur 29D).t arranged together and "41000500" is formed as a single independent input data (3209, 3221-3223: Figure 29D).

Nu zal een verklaring worden gegeven van de vierde uitvoeringsvorm volgens de onderhavige uitvinding.An explanation will now be given of the fourth embodiment of the present invention.

5 Figuur 30 toont de vierde uitvoeringsvorm van de taal analyse in richting volgens de onderhavige uitvinding, toegepast als automatische Engels-Japans vertaali nri chti ng.Figure 30 shows the fourth embodiment of the language analysis device according to the present invention used as automatic English-Japanese translation.

Deze uitvoeringsvorm bevat een invoerverwerkingssectie 4014 en in deze invoerverwerkingssectie 4014 worden gegevens ingevoerd vanaf een 10 invoerinrichting 4012. De invoerinrichting 4012 bevat bijvoorbeeld een toetsenbord met karaktertoetsen zoals alfanumerieke toetsen en functietoetsen, een optische karakterregistratie-inrichting voor het lezen van een Engelse op het papier geregistreerde tekst of een bestands!eesin-richting voor een magnetische schijf.This embodiment includes an input processing section 4014, and in this input processing section 4014, data is input from an input device 4012. The input device 4012 includes, for example, a keyboard with character keys such as alphanumeric keys and function keys, an optical character registration device for reading an English recorded on the paper text or a file format for a magnetic disk.

15 De ingangsverwerkingssectie 4014 heeft een ingangskarakterarray- buffer 4014a en herbergt de ingevoerde Engelse zin, ingevoerd vanaf de invoereenheid 4012 in de ingangskarakterarraybuffer 4014a. De invoerverwerkingssectie 4014 leest de ingevoerde zin, opgeslagen in de ingangskarakterarraybuffer 4014a uit en voert deze toe aan een eenheid-20 uitsnijdingssectie 4016. De eenheiduitsnijdingssectie 4016 is een functionele sectie die de woorden!ijstverwijzingseenheden uit de ingangs-zin, toegezonden door de invoerverwerkingssecties 4014, afzondert. Een begrenzingseenheid 4018 bevat de begrenzers zoals spaties en komma's.The input processing section 4014 has an input character array buffer 4014a and houses the input English phrase input from the input unit 4012 into the input character array buffer 4014a. The input processing section 4014 reads out the input phrase stored in the input character array buffer 4014a and supplies it to a unit-cutout section 4016. The unit-cutout section 4016 is a functional section which transmits the input reference word units 4014, from the input processing sections 4014a. isolates. A delimiter 4018 contains the delimiters such as spaces and commas.

De eenheiduitsnijdingssectie 4016 leest de begrenzers uit de be-25 grenzerstabel 4018 en verdeelt de ingangszin, toegezonden vanuit de invoerverwerkingssectie 4014, in karakterarray's als eenheden voor het zoeken in een referentiewoordenlijst 4020 door de zin te verdelen in delen daar waar begrenzers aanwezig zijn. De deelkarakterarray's worden ingevoerd in de woorden!ijstopzoeksectie 4022.The unit excision section 4016 reads the limiter from the limiter table 4018 and divides the input sentence sent from the input processing section 4014 into character arrays as units for searching a reference glossary 4020 by dividing the sentence where limiter is present. The sub-character arrays are entered in the words ice lookup section 4022.

30 De woorden!ijstopzoeksectie 4022 zorgt voor het zoeken in de referentiewoordenlijst 4020 op de ingangszin die verdeeld is in de woorden-1ijstverwi jzingseenheden, toegezonden door de eenheiduitsnijdingssectie 4016. In de referentiewoordenlijst 4020 zijn ingangsgegevens voor de Engelse karakterarray's opgeslagen, zinsdelen ervan, aard van de infor-35 matie, enz. zoals bijvoorbeeld getoond is in figuur 31. De referentiewoordenlijst 4020 bevat ook naast de eigennamen zoals in de figuur is getoond karakterarray's voor andere zinsdelen, bijvoorbeeld voor werkwoorden en bijvoeglijke naamwoorden. De registratie van eigennamen als zinsdeel in de figuur wordt toegepast in de nog te beschrijven de ver-40 werking van de eigennamen maar niet de betekenis als bij gebruikelijke 8702358 44 y * 4 grammaticale eigennamen. De type-informatie wijst verder op diegenen die door een eigennaam worden vertegenwoordigd en die niet altijd tot een enkele kunnen worden beperkt omdat, zoals later zal worden beschreven, een eigennaam een veelheid van typen kan vertegenwoordigen afhan-5 kei ijk van het toepassingsgeval.The words list lookup section 4022 performs the search in the reference word list 4020 for the input sentence divided into the word list reference units transmitted by the unit excision section 4016. The reference word list 4020 stores input data for its English character arrays, nature of the information, etc., as shown, for example, in Figure 31. The reference dictionary 4020 also contains, in addition to the proper nouns as shown in the figure, character arrays for other phrases, for example for verbs and adjectives. The registration of proper nouns as a phrase in the figure is used in the yet to be described processing of proper nouns, but not the meaning as with conventional 8702358 44 y * 4 grammatical proper names. The type information further points to those represented by a proper name which cannot always be limited to a single one because, as will be described later, a proper name can represent a variety of types depending on the application case.

In de woordenlijstopzoeksectie 4022 wordt gezocht in de referen-tiewoordenlijst 4020 op de karakterarray, verdeeld in de woordenlijst-referentie-eenheden, en indien de karakterarray een eigennaam is, dan wordt de array afgegeven aan de eigennaam verwerkingssectie 4024 waarin 10 de verwerking van de eigennaam op de later nog te beschrijven wijze wordt uitgevoerd. Indien het niet gaat om een eigennaam dan wordt ze afgegeven aan de verwerkingssectie 4036 en vastgehouden in de woorden-lijstinformatiebehoudtabel 4036a van de verwerkingssectie 4036.In the glossary lookup section 4022, the reference dictionary 4020 on the character array is searched, divided into the glossary reference units, and if the character array is a proper name, the array is output to the proper name processing section 4024 in which the proper name processing is performed. is carried out in the manner to be described later. If it is not a proper name, it is issued to the processing section 4036 and held in the glossary information retention table 4036a of the processing section 4036.

De eigennaam verwerkingssectie 4024 bevat een voorafgaand zinsein-15 de verwerkingssectie 4026, een voorafgaande eigennaam verwerkingssectie 4028, een sectie 4030 voor het verwerken van de eigennaam op zich, een verwerkingssectie 4032 voor de voorafgaande eigennaam en de eigennaam op zich en een sectie 4034 die informatie verschaft omtrent het standaardtype.The proper name processing section 4024 includes a preceding sentence signal processing section 4026, a preceding proper name processing section 4028, a section 4030 for processing the proper name in itself, a processing section 4032 for the previous proper name and the proper name in itself, and a section 4034 containing that information. about the standard type.

20 De sectie 4026 voor het verwerken van het einde van de voorafgaande zin beoordeelt of de karakterarray, voorafgaand aan de karakterarray die is ingevoerd in en opgezocht door de woordenlijstopzoeksectie 4022, het einde van een zin is of niet en, indien de voorafgaande karakterarray het einde van een zin is dan wordt de hóófdletter aan het begin van 25 de te verwerken karakterarray omgevormd in een kleine letter, en toegezonden aan de woordenlijstopzoeksectie 4022 waardoor de woordenlijstop-zoeksecties 4022 opnieuw de referentiewoordenlijst 4022 aanspreekt. De karakterarray, die niet wordt terug gevonden zelfs bij een tweede poging, wordt beoordeeld als een niet geregistreerde eigennaam en wordt 30 toegezonden aan de verwerkingssectie 4036 en opgeslagen in de woordenlijst! nformatiebehoudtabel 4036a. Als verder de voorafgaande karakterarray niet aan het einde van een zin staat dan wordt ze toegevoerd aan de verwerkingssectie 4036 als een eigennaam waarvan de type-informatie onbekend is en geregistreerd in de woordenlijstinformatiebehoudtabel 35 4036a zoals later nog zal worden beschreven.20 The preceding sentence processing section 4026 judges whether the character array, prior to the character array entered and searched by the glossary lookup section 4022, is the end of a sentence or not and, if the preceding character array is the end of a sentence, the capital letter at the beginning of the character array to be processed is converted into a lowercase letter, and forwarded to the glossary lookup section 4022 whereby the glossary lookup sections 4022 again addresses the reference glossary 4022. The character array, which is not retrieved even on a second attempt, is judged as an unregistered proper name and is sent to processing section 4036 and stored in the glossary! information retention table 4036a. Further, if the preceding character array is not at the end of a sentence, it is supplied to the processing section 4036 as a proper name whose type information is unknown and recorded in the glossary information retention table 35 4036a as will be described later.

De voorafgaande eigennaam verwerkingssectie 4028 ontleedt de type informatie van de voorafgaande karakterarray, toegezonden door de sectie 4026 voor het verwerken van het einde van de voorafgaande zin en geeft het resultaat af aan de sectie 4030 voor het verwerken van de 40 eigennaam op zichzelf. De sectie 4030 voor het verwerken van de eigen- 8702359 * J- 45 naam op zichzelf ontderzoekt de type informatie van de te ontleden eigennaam en, zoals later nog zal worden beschreven, indien de type informatie van ofwel de eigennaam ofwel de voorafgaande eigen naam niet is geregistreerd dan worden de eigennaam en de voorafgaande eigen naam 5 tezamen geanaliseerd door de geregistreerde informatie voor de ander ervan en vastgehouden in de woordenlijst!nformatiebehoudtabel 4036a in de verwerk!ngssectie 4036.The preceding proper name processing section 4028 parses the type information of the previous character array sent by the section 4026 for processing the end of the preceding sentence and outputs the result to the section 4030 for processing the 40 proper name by itself. The section 4030 for processing the proper name 8702359 * J-45 by itself examines the type information of the proper name to be parsed and, as will be described later, if the type information of either the proper name or the previous proper name is not is registered then the proper name and the previous proper name 5 are together analyzed by the registered information for their other and held in the glossary information retention table 4036a in the processing section 4036.

De verwerkingssectie 4032 voor de voorafgaande eigennaam en de eigennaam op zichzelf onderzoekt het gemeenschappelijke gedeelte met de 10 type-informatie van de eigennaam en de voorafgaande te analyseren eigennaam, analiseert deze eigennaam met het gemeenschappelijke gedeelte en geeft het resultaat af aan de verwerkingssectie 4036 en bergt deze op in de woordenlijstinformatiebehoudtabel 4036a in de verwerkings-sectie 4036.The processing section 4032 for the previous proper name and the proper name in itself examines the common part with the type information of the proper name and the previous proper name to be analyzed, analyzes this proper name with the common part and outputs the result to the processing section 4036 and stores this in the glossary information retention table 4036a in the processing section 4036.

15 De sectie 4036 die de standaardtype-informatie verschaft levert type-informatie bij een eigennaam na uitlezing uit de woordenlijstin-formatiebehoudtabel 4036a, toegezonden via de eenheiduitsnijdingssectie 4016 aan de woordenlijstopzoeksectie 4022 indien er geen type-informatie aanwezig blijkt te zijn als resultaat van het zoekproces in de re-20 ferentiewoordenlijst 4020 door de woordenlijstterugwinsectie 4022. Omdat het veelal zo is dat een eigennaam van verschillend type is afhankelijk van de gebruikssituatie worden alle noodzakelijk geachte type-informaties verschaft. Bijvoorbeeld "persoon, plaats, groep en anderen" worden verschaft. Nadat de eigennaam voorzien is van de type-informatie 25 zendt de sectie 4034 voor het verschaffen van de standaard type-informatie de gegevens naar de verwerkingssectie 4036 en slaat deze op in de woordenlijstinformatiebehoudtabel 4036a in de verwerkingssectie 4036.The section 4036 providing the standard type information supplies type information to a proper name after reading from the glossary information retention table 4036a, sent via the unit excision section 4016 to the glossary lookup section 4022 if no type information appears to be present as a result of the search process. in the reference glossary 4020 through the glossary recovery section 4022. Because it is often the case that a proper name is of different type depending on the situation of use, all type information deemed necessary is provided. For example, "person, place, group and others" are provided. After the proper name is provided with the type information 25, the section 4034 for providing the standard type information sends the data to the processing section 4036 and stores it in the glossary information retention table 4036a in the processing section 4036.

De verwerkingssectie 4036a die voorzien is van de woordenlijstinformatiebehoudtabel 4036a slaat de gegevens, toegezonden vanuit de ver-30 werkingssecties 4032 voor de voorafgaande eigennaam en de eigennaam op zichzelf op, met de standaard type-informatie leverende sectie 4034 of de woordenlijstopzoeksectie 4022, in de woordenlijstinformatiebehoudtabel 4036a en leest daarna de opgeslagen gegevens uit en voert deze toe aan de analysesectie 4038. De analysesectie 4038 voert de analyse uit 35 voor de ingevoerde zin nadat deze onderworpen is aan de morfologische analyse en leest uit vanuit de woordenlijstinformatiebehoudtabel 4036a.The processing section 4036a provided with the glossary information retention table 4036a stores the data sent from the processing sections 4032 for the previous proper name and the proper name by itself, with the standard type information supplying section 4034 or the glossary lookup section 4022, in the glossary information retention table 4036a, and then reads out the stored data and outputs it to the analysis section 4038. The analysis section 4038 performs the analysis for the entered phrase after it has been subjected to the morphological analysis and reads from the glossary information retention table 4036a.

De werking van deze inrichting zal met verwijzing naar het stroomschema van figuur 32 worden verklaard.The operation of this device will be explained with reference to the flow chart of Figure 32.

40 Allereerst wordt een Engelse ingangszin van de invoerinrichting 8702359 jf % 46 4012 ingelezen in de invoerverwerkingssectie 4014 (4100). De in de in-voerverwerkingssectie 4014 ingelezen zin wordt geladen in de ingangska-rakterarraybuffer 4014a. De in de ingangskarakterarraubuffer 4014a geladen ingangszin wordt uitgelezen naar de eenheiduitsnijdingssectie 5 4016.First of all, an English input sentence of the input device 8702359 jf% 46 4012 is read into the input processing section 4014 (4100). The phrase read into the input processing section 4014 is loaded into the input character array buffer 4014a. The input sentence loaded in the input character array buffer 4014a is read out to the unit cut-out section 4016.

Als de ingangszin is ingevoerd dan leest de eenheiduitsnijdingssectie 4016 de begrenzers uit de begrenzingstabel 4018 teneinde de woordenlijstverwijzingseenheden uit te snijden (4102). Dat wil zeggen, de karakterarray's die de ingevoerde ingangszin vormen worden achter-10 eenvolgens gedeeld te beginnen bij het begin van de karakterarray's in terugwinsleutelkarakterarray's, als eenheden waarop moet worden gezocht in de referentiewoordenlijst 4020, door verdeling op die delen waar de begrenzers zoals een spatie en een dubbele punt aanwezig zijn. De sectie beoordeelt of de laatste ingedeelde woorden!ijstverwijzingseenheid, 15 dat wil zeggen de laatste terugwinsleutelkarakterarray, al gevonden is (4104) en indien er nog meer terugwinsleutelkarakterarray's zijn (nog niet de laatste) dan wordt de terugwinsleutelkarakterarray toegezonden aan de woordenlijstopzoeksectie 4022.Once the input sentence has been entered, the unit excision section 4016 reads the limiter from the boundary table 4018 to cut out the glossary reference units (4102). That is, the character arrays constituting the input input sentence are sequentially divided beginning at the beginning of the character arrays in recovery key character arrays, as units to be searched in the reference dictionary 4020, by division on those parts where the delimiters such as a space and a colon is present. The section assesses whether the last classified words! List reference unit, ie the last recovery key character array, has already been found (4104) and if there are more recovery key character arrays (not yet the last one), the recovery key character array is sent to the glossary lookup section 4022.

Als een terugwinsleutelkarakterarray is toegezonden aan de woor-20 den!ijstopzoeksectie 4022 dan wordt door de woordelijstopzoeksectie 4022 gezocht in de verwijzingswoordenlijst 4020 teruggewonnen op de te-rugwi nsl eutel karakterarray (4106). De sectie beoordeelt of de terugwinsleutelkarakterarray al dan niet aanwezig is in de ingangsgegevens van de referentiewoordenlijst 4020 zoals getoond is in figuur 31 (4108) en, 25 indien er een ingangsgegeven is, dan wordt het zinsdeel, opgeslagen in de referentiewoordenlijst 4020 uitgelezen en wordt beoordeeld of de terugwinsleutelkarakterarray al dan niet een eigennaam is (4110).When a recovery key character array is sent to the word list search section 4022, the word list search section 4022 searches the reference glossary 4020 to recover the recover key character array (4106). The section judges whether or not the recovery key character array is present in the input data of the reference word list 4020 as shown in Figure 31 (4108) and, if an input data is given, the phrase stored in the reference word list 4020 is read and is evaluated whether or not the recovery key character array is a proper name (4110).

Als de terugwinsleutelkarakterarray geen eigennaam is, dan zendt de woordenlijstopzoeksectie 4022 de uitgelezen gegevens uit de verwij-30 zingswoordenlijst 4020 naar de verwerkingssectie 4036 en registreert deze in de woordenlijstinformatiebehoudtabel 4036a (4112). Als de gegevens zijn opgeslagen in de woordenlijstinformatiebehoudtabel 4036a, dan worden de ingangsgegevens, aangevende dat de gegevens zijn opgeslagen in de eenheiduitsnijdingssectie 4016 en de gegevens voor de juist daar-35 voor opgeslagen terugwinsleutelkarakterarray ingevoerd vanuit de verwerkingssectie 4036. Daarna keert het stroomschema terug naar stap 4102 en wordt opnieuw een woordenlijstreferentie-eenheid uitgesneden in de eenheiduitsnijdingssectie 4016.If the recovery key character array is not a proper name, then the glossary lookup section 4022 sends the readout data from the reference glossary 4020 to the processing section 4036 and registers it in the glossary information retention table 4036a (4112). If the data is stored in the glossary information retention table 4036a, the input data indicating that the data is stored in the unit excision section 4016 and the data for the just-previously recovered key character array is input from the processing section 4036. Then, the flowchart returns to step 4102 and another word reference unit is cut out in the unit cutout section 4016.

Als in stap 4110 de terugwinsleutelkarakterarray een eigennaam is, 40 dan zendt de woordenlijstopzoeksectie 4022 de gegevens van de uit de 8702359 47 4 Λ - Τ referentiewoordenlijst 4020 uitgelezen eigennaam (in het volgende eenvoudig aangeduid als een eigennaam) naar de eigennaam verwerkingssectie 4024 tezamen met de gegevens van de voorafgaande terugwinsleutelkarak-terarray, ingevoerd vanuit de woordenlijstinformatiebehoudtabel 4036a 5 in de verwerkingssectie 4036 via de eenheiduitsnijdingssectie 4016 naar de woordenlijstopzoeksecvtie 4022, en de verwerking van de in de woordenlijst geregistreerde eigennaam wordt uitgevoerd in de eigennaam verwerkingssectie 4024 (4124).If, in step 4110, the recovery key character array is a proper name, 40 then the glossary lookup section 4022 transmits the data from the proper name read from the 8702359 47 4 Λ - Τ reference glossary 4020 (simply referred to as a proper name in the following) to the proper name processing section 4024 together with the proper name processing section 4024. data from the previous recovery key character array input from the glossary information retention table 4036a 5 into the processing section 4036 through the unit excision section 4016 to the glossary lookup section 4022, and the processing of the glossary proper name is performed in the proper name processing section 4024 (4124).

De verwerking van de in de woordenlijst geregistreerde eigennaam 10 wordt verklaard met verwijzing naar het stroomschema van figuur 33.The processing of the word name registered in the glossary 10 is explained with reference to the flow chart of Figure 33.

De gegevens, toegezonden vanuit de woordenlijstopzoeksectie 4022 naar de eigennaam verwerkingssectie 4024 worden via de sectie 4026 voor het verwerken van het einde van de voorafgaande zin toegevoerd aan de sectie 4028 voor het verwerken van de voorafgaande eigennaam. Tijdens 15 de verwerking van de in de woordenlijst geregistreerde eigennaam functioneert de sectie 4026 voor het verwerken van het einde van de voorafgaande zin niet.The data sent from the glossary lookup section 4022 to the proper name processing section 4024 is passed through the preceding sentence processing section 4026 to the preceding proper name processing section 4028. During processing of the proper name registered in the glossary, section 4026 does not function to process the end of the preceding sentence.

De sectie 4028 voor het verwerken van de voorafgaande eigennaam beoordeelt of het ingangsgegeven, voorafgaand aan de eigennaam, een 20 niet in de referentiewoordenlijst 4020 geregistreerde eigennaam is of niet, dat wil zeggen of het betrokken is geweest bij de later nog te beschrijven verwerking van de niet in de woordenlijst gergistreerde eigennaam of niet (4100). Als het gaat om een niet geregistreerde eigennaam dan beoordeelt de verwerkingssectie het gehele gedeelte van 25 de eigennaam en van de voorafgaande niet geregistreerde eigennaam als een eigennaam met type informatie voor een eigennaam (4002), zendt de gegevens aan de verwerkingssectie 4036 en slaat deze op in de woordenlijst! nformatiebehoudtabel 4036a (4218).The preceding proper name processing section 4028 judges whether the entry, prior to the proper name, is a proper name not registered in the reference dictionary 4020 or not, that is, whether it has been involved in the processing of the previous to be described later. proper name not registered in the glossary or not (4100). In the case of an unregistered proper name, the processing section judges the entire part of the proper name and of the previous unregistered proper name as a proper name with type information for a proper name (4002), sends the data to processing section 4036 and stores it in the glossary! information retention table 4036a (4218).

Als de sectie 4028 voor het verwerken van de voorafgaande eigen-30 naam in stap 4200 oordeelt dat het ingangsgegeven, voorafgaand aan de eigennaam, niet als eigennaam is geregistreerd dan wordt beoordeeld of het ingangsgegeven, voorafgaand aan de eigennaam een geregistreerde eigennaam in de referentiewoordenlijst 402 is of niet (4204). Als het ingangsgegeven direct voorafgaand aan de eigen naam een geregistreerde 35 eigennaam is dan wordt beoordeeld of de type informatie voor de voorafgaande eigennaam onbekend is of niet, dat wil zeggen of deze niet in de referentiewoordenlijst 4020 is geregistreerd of niet (4206).If the section 4028 for processing the previous proper name in step 4200 judges that the input data, prior to the proper name, is not registered as a proper name, then it is judged whether the input data, prior to the proper name, is a registered proper name in the reference dictionary 402 is or is not (4204). If the entry data is a registered proper name immediately prior to the proper name, then it is judged whether the type information for the previous proper name is unknown or not, ie whether it is not registered in the reference dictionary 4020 or not (4206).

In het geval de type-informatie van de voorafgaande eigennaam onbekend is dan gaat het stroomschema verder met stap 4202 en wordt het 40 gehele gedeelte van de eigennaam op zichzelf en de direct daaraan voor- 8702359 48 S * X $ afgaande eigennaam beoordeeld als een eigennaam met type-informatie (4202), en de verwerkingssectie 4028 voor de voorafgaande eigennaam zendt de gegevens naar de verwerkingssectie 4036. De naar de verwerkingssectie 4036 gezonden gegevens worden geregistreerd in de woorden-5 lijstinfarmatiebehoudtabel 4036 (4218).In case the type information of the preceding proper name is unknown, the flowchart proceeds to step 4202 and the entire portion of the proper name in itself and the immediately preceding 8702359 48 S * X $ proper name is judged as a proper name with type information (4202), and the processing section 4028 for the previous proper name sends the data to the processing section 4036. The data sent to the processing section 4036 is recorded in the words-5 list information retention table 4036 (4218).

Als de voorafgaande eigennaam verwerkingssectie 4028 in stap 4206 oordeelt dat de type-informatie voor de voorafgaande eigennaam niet onbekend is, dat wil zeggen ze is geregistreerd in de referentiewoorden-1 ijst 4020, dan worden de gegevens uit de sectie 4028 voor het verwer-10 ken van de voorafgaande eigennaam toegezonden aan de sectie 4030 voor het verwerken van de eigennaam op zichzelf. De sectie 4030 voor het verwerken van de eigennaam op zichzelf oordeelt of de type-informatie van de eigennaam onbekend is of niet (4208). Als de type-informatie voor de eigennaam onbekend is dan beoordeelt de sectie 4030 voor het 15 verwerken van de eigennaam op zich het gehele gedeelte van de eigennaam op zichzelf en de direct daaraan voorafgaande eigennaam als een eigennaam met type-informatie van een eigennaam (4210) en zendt de gegevens aan de verwerkingssectie 4036. De aan de verwerkingssectie 4036 toegezonden gegevens worden geregistreerd in de woorden!ijstinformatiebe-20 houdtabel 4036a.If the preceding proper name processing section 4028 judges in step 4206 that the type information for the preceding proper name is not unknown, that is, it is registered in the reference words -1 list 4020, then the data from the section 4028 is processed for the processing. of the previous proper name sent to section 4030 for processing the proper name by itself. The proper name processing section 4030 alone decides whether the proper name type information is unknown or not (4208). If the type information for the proper name is unknown, then the section 4030 for processing the proper name per se judges the entire part of the proper name in itself and the immediately preceding proper name as a proper name with type information of a proper name (4210 ) and sends the data to the processing section 4036. The data sent to the processing section 4036 is recorded in the words list information retention table 4036a.

Als de sectie 4030 voor het verwerken van de eigennaam op zich oordeelt dat de type-informatie van de eigennaam op zich niet onbekend is, dat wil zeggen ze is geregistreerd in de referentiewoordenlijst 4020, dan zendt de sectie 4030 voor het verwerken van de eigennaam de 25 gegevens naar de verwerkingssectie 4032 voor de voorafgaande eigennaam en de eigennaam op zichzelf. De verwerkingssectie 4032 voor de voorafgaande eigennaam en voor de eigennaam op zichzelf beoordeelt of er een gemeenschappelijk type is in de type-informatie tussen de eigen naam op zichzelf en de direct daaraan voorafgaande eigennaam (4212) en indien 30 er een gemeenschappelijk type is dan wordt het gehele gedeelte van de eigen naam op zichzelf en van de direct daaraan voorafgaande eigennaam beoordeeld als eigennaam met de gemeenschappelijke type-informatie (4214) en worden de gegevens overgedragen aan de verwerkingssectie 4036. De aan de verwerkingssectie 4036 toegezonden gegevens worden ge-35 registreerd in de woorden!ijstinformatiebehoudtabel 4036a (4218).If the proper name processing section 4030 judges per se that the proper name type information is not unknown per se, i.e. it is registered in the reference dictionary 4020, then the proper name processing section 4030 sends the 25 data to the processing section 4032 for the previous proper name and the proper name in itself. The processing section 4032 for the previous proper name and for the proper name in itself judges whether there is a common type in the type information between the own name in itself and the immediately preceding proper name (4212) and if there is a common type then the entire portion of the own name in itself and of the immediately preceding proper name is judged as a proper name with the common type information (4214) and the data is transferred to the processing section 4036. The data sent to the processing section 4036 is recorded in the words! list information retention table 4036a (4218).

Indien er geen gemeenschappelijk type is in de type-informatie van de eigennaam op zichzelf en de direct daaraan voorafgaande eigennaam dan oordeelt de verwerkingssectie dat de eigennaam een eigennaam is met de type-informatie die teruggewonnen is uit de referentiewoordenlijst 40 4020 en verschillend is van de direct daaraan voorafgaande eigennaam 0/ 0 2 359 ί *· 49 (4216) en zendt de gegevens naar de verwerkingssectie 4036. De naar de verwerkingssectie 4036 toegezonden gegevens worden geregistreerd in de woorden!ijstinformatiebehoudtabel 4036a (4218).If there is no common type in the type information of the proper noun in itself and the immediately preceding proper name, the processing section judges that the proper name is a proper name with the type information recovered from the reference dictionary 40 4020 and is different from the immediately preceding proper name 0/0 2 359 ί * · 49 (4216) and sends the data to the processing section 4036. The data sent to the processing section 4036 is recorded in the words list information retention table 4036a (4218).

Verwezen wordt opnieuw naar figuur 32. Als in stap 4108 geen te-5 rugwinsleutelkarakterarray aanwezig is in de ingangsgegevens van de re-ferentiewoorden!ijst 4020 dan wordt beoordeeld of het eerste karakter van de terugwinsleutelkarakterarray een hoofdletter is of niet (4116) en indien het geen hoofdletter is dan beoordeelt de woorden!ijstopzoek-sectie 4022 de terugwinsleutelkarakterarray als een niet geregistreerd 10 woord en zendt dit naar de verwerkingssectie 4036 voor registratie in de woorden!ijstinformatiebehoudtabel 4036a (4118).Reference is again made to Figure 32. If in step 4108 no te-recovery key character array is present in the input data of the reference words, list 4020 then it is judged whether the first character of the recovery key array is capitalized or not (4116) and is not capitalized, then the words op ice lookup section 4022 judges the recovery key character array as an unregistered word and sends it to the processing section 4036 for registration in the words! list information retention table 4036a (4118).

Als het eerste karakter een hoofdletter is dan worden de gegevens voor de terugwinsleutelkarakterarray toegezonden tezamen met de gegevens voor de voorafgaande terugwinsleutelkarakterarray vanuit de woor-15 denlijstopzoeksectie 4022 naar de eigennaam verwerkingssectie, waar de verwerking van de niet in de woordenlijst geregistreerde eigennaam wordt uitgevoerd (4120).If the first character is uppercase, the data for the recovery key character array is sent along with the data for the previous recovery key character array from the glossary lookup section 4022 to the proper name processing section, where processing of the proper name not registered in the glossary is performed (4120 ).

Met verwijzing naar figuur 34 zal worden uitgelegd hoe de verwerking van een niet in de woordenlijst geregistreerde eigennaam plaats 20 vindt.With reference to Figure 34, it will be explained how processing of a proper noun not registered in the glossary takes place.

De gegevens van de terugwinsleutelkarakterarray worden tezamen met de gegevens van de voorafgaande terugwinsleutelkarakterarray toegezonden aan de sectie 4026 voor het verwerken van het einde van een zin en de sectie 4026 voor het verwerken van het einde van een zin beoordeelt 25 of het einde van het voorafgaande ingangsgegeven een kandidaat is voor het eind van de zin of niet (4300). De beoordeling of dit een kandidaat is voor het einde van de zin of niet wordt uitgevoerd door te beoordelen of het einde van het voorafgaande ingangsgegeven een kandidaat is voor het einde van de zin zoals een afzonderlijke punt (.), enz. of 30 niet.The recovery key character array data together with the preceding recovery key character array data is sent to the sentence end processing section 4026 and the sentence end processing section 4026 judges whether the end of the preceding input data is is a candidate before the end of the sentence or not (4300). The assessment of whether this is a candidate for the end of the sentence or not is performed by judging whether the end of the preceding entry is a candidate for the end of the sentence such as a separate period (.), Etc. or not.

Als het einde van het voorafgaande ingangsgegeven een kandidaat is voor het einde van de zin dan worden gegevens vanaf de verwerkingssectie 4026 voor het verwerken van het einde van de voorafgaande zin toegezonden aan de sectie 4028 voor het verwerken van de voorafgaande 35 eigennaam en de sectie 4028 voor het verwerken van de voorafgaande eigennaam beoordeelt het voorafgaande ingangsgegeven als het einde van de zin (4302), vormt het eerste karakter in de terugwinsleutelkarakterarray om tot een kleine letter en zendt het naar de woordenlijstopzoek-sectie 4022.If the end of the preceding entry is a candidate for the end of the sentence, data from the processing section 4026 for processing the end of the preceding sentence is sent to the section 4028 for processing the previous proper name and the section 4028 for processing the preceding proper name, the preceding input data judges as the end of the sentence (4302), converts the first character in the recovery key character array to a lowercase letter and sends it to the dictionary lookup section 4022.

40 De woordenlijstopzoeksectie 4022 zorgt voor het zoeken in de refe- 870235940 The glossary lookup section 4022 takes care of the search in refe 8702359

> V> V

50 rentiewoordenlijst 4020 op de terugwinsleutelkarakterarray, omgevormd naar kleine letters (4304) en beoordeelt of er al dan niet een ingangsgegeven in de referentiewoordenlijst 4020 aanwezig is (4306). Als er een ingangsgegeven aanwezig is dan zendt de woordenlijstopzoeksectie 5 4022 de uit de referentiewoordenlijst 4020 verkregen gegevens naar de verwerkingssectie 4036 en voegt deze toe aan de woordenlijstinformatiebehoudtabel 4036a (4308). Als er geen ingangsgegeven aanwezig is dan vormt de woordenlijstopzoeksectie 4022 het eerste karakter in de ontvangen karakterarray om in een hoofdletter en zendt deze als een niet 10 geregistreerde eigennaam toe aan de verwerkingssectie 4036 ter registratie in de woordenlijstinformatiebehoudtabel 4036a (4310).50 interest glossary 4020 on the recovery key character array, converted to lowercase (4304) and judges whether or not there is an entry in the reference glossary 4020 (4306). If an entry is present, the glossary lookup section 5 4022 sends the data obtained from the reference glossary 4020 to the processing section 4036 and adds it to the glossary information retention table 4036a (4308). If no entry is present, then the glossary lookup section 4022 capitalizes the first character in the received character array and sends it as an unregistered proper name to the processing section 4036 for registration in the glossary information retention table 4036a (4310).

Als in stap 4300 de sectie 4026 voor het verwerken van het einde van de voorafgaande zin oordeelt dat het einde van het voorafgaande ingangsgegeven geen kandidaat is voor het einde van de zin, dan worden de 15 gegevens vanuit de sectie 4026 voor verwerking van het einde van de voorafgaande zin toegevoerd aan de sectie 4028 voor het verwerken van de voorafgaande eigennaam, en de sectie 4028 beoordeelt het voorafgaande ingangsgegeven als niet zijnde het einde van de zin (4312).If, in step 4300, the end of the preceding sentence processing section 4026 judges that the end of the preceding input is not a candidate for the end of the sentence, then the data from the end of processing section 4026 is not the preceding sentence is supplied to the section 4028 for processing the preceding proper name, and the section 4028 judges the preceding input as not being the end of the sentence (4312).

De gegevens worden vanaf de sectie 4028 voor het verwerken van de 20 voorafgaande eigennaam toegevoerd aan de sectie 4030 voor het verwerken van de eigennaam op zich en de sectie 4030 beoordeelt de terugwinsleu-telkarakterarray als een eigennaam waarvan de type-informatie onbekend is (4314).The data is passed from the section 4028 for processing the previous proper name to the section 4030 for processing the proper name per se, and the section 4030 judges the recovery key character array as a proper name whose type information is unknown (4314) .

De sectie 4030 voor het verwerken van de eigennaam op zich stuurt 25 de gegevens terug naar de sectie 4028 voor het verwerken van de voorafgaande eigennaam en de sectie 4028 voert de verwerking uit op de in de woordenlijst geregistreerde eigennaam (4316). De verwerking van de in de woordenlijst geregistreerde eigennaam is gelijk aan die, getoond in figuur 33.The proper name processing section 4030 in itself sends the data back to the previous proper name processing section 4028 and the section 4028 performs the processing on the glossary registered proper name (4316). The processing of the proper name registered in the glossary is the same as that shown in Figure 33.

30 Als, opnieuw met verwijzing naar figuur 32, de uitgesneden woordenl ijstreferentie-eenheid aan het einde van de array is in stap 4104, dan zendt de woordenlijstopzoeksectie 4022 een desbetreffend indicatie-signaal naar de sectie 4034 die de standaard type-informatie verschaft en de sectie 4034 leest de in de woordenl ijstinformatiebehoudtabel 35 4036a in de verwerkingssectie 4036 geregistreerde informatie uit en verschaft het voornaamwoord met de standaard type-informatie (4122).If, again with reference to Figure 32, the cut-out glossary reference unit is at the end of the array in step 4104, then the glossary lookup section 4022 sends an appropriate indication signal to section 4034 which provides the standard type information and the section 4034 reads out the information registered in the glossary information retention table 35 4036a in the processing section 4036 and provides the pronoun with the standard type information (4122).

Met verwijzing naar figuur 35 zal nu een verklaring worden gegeven van het verschaffen van een eigennaam met standaard type-informa-tie.With reference to Figure 35, an explanation will now be given of providing a proper name with standard type information.

40 In de sectie 4034, die de standaard type-informatie verschaft, 8702359 51 λ i i i wordt allereerst een aanwijzer ingesteld bij het begin van de gegevens in de woordenlijstinformatiebehoudtabel 4036a (4400). Dat wil zeggen, de aanwijzer wordt ingesteld op het ingangsgegeven aan het begin van de ingangszin, die verdeeld is in ingangsgegevens, welke resp. worden 5 voorzien van informatie door het zoeken in de referentiewoordenlijst 4020. Vervolgens wordt beoordeeld of het ingangsgegeven, aangewezen door de aanwijzer een eigennaam is of niet (4402) en indien het een eigennaam is dan wordt beoordeeld of de type-informatie van de eigennaam bekend is of niet (4404). Indien het geen eigen naam is dan gaat 10 het stroomschema verder naar stap 4408 en wordt de aanwijzer voortbewogen naar het volgende ingangsgegeven.40 In the section 4034, which provides the standard type information, 8702359 51 λ i i i, a pointer is first set at the beginning of the data in the glossary information retention table 4036a (4400). That is, the pointer is set to the input data at the beginning of the input sentence, which is divided into input data, respectively. are provided with information by searching in the reference dictionary 4020. Subsequently, it is judged whether the input data designated by the pointer is a proper name or not (4402) and if it is a proper name then it is judged whether the type information of the proper name is known is or is not (4404). If it is not a custom name, the flow chart proceeds to step 4408 and the pointer is advanced to the next input data.

Als in stap 4404 de type-informatie voor de eigennaam onbekend is dan wordt de standaard type-informatie verschaft (4406). Bij verschaffen van de standaard type-informatie wordt deze toegevoegd aan een 15 eigennaam waarvan de type-informatie onbekend is zoals getoond is in het onderste gedeelte van figuur 36. In het onderste gedeelte van de figuur is bijvoorbeeld de eigen naam "Johnson" waarvan de type-informatie onbekend is, voorzien van alle soorten type-informatie, d.w.z. "persoon, plaats, groep, en anderen". Door alle type-informaties te 20 verschaffen aan de eigennaam waarvan de type-informatie onbekend is is het mogelijk om ruimte te verschaffen zodat analyse van de eigennaam in een willekeurig aantal typen in de navolgende syntactische analyse mo-gelijk is.If in step 4404 the type information for the proper name is unknown, the standard type information is provided (4406). When providing the standard type information, it is appended to a proper name whose type information is unknown as shown in the lower part of Figure 36. In the lower part of the figure, for example, the proper name is "Johnson". type information is unknown, provided with all types of type information, ie "person, place, group, and others". By providing all the type information to the proper name whose type information is unknown, it is possible to provide space so that analysis of the proper name in any number of types is possible in the following syntactic analysis.

Als in stap 4404 de type-informatie van de eigennaam niet onbekend 25 is dan gaat het stroomschema verder met stap 4408 en wordt de aanwijzer voortbewogen naar een volgend ingangsgegeven.If in step 4404 the proper name type information is not unknown, then the flowchart proceeds to step 4408 and the pointer is advanced to a next input.

Er wordt beoordeeld of het door de aanwijzer aangewezen ingangsgegeven aan het einde is of niet (4408) en indien deze niet aan het einde is dan keert het stroomschema terug naar stap 4402 en wordt beoordeeld 30 of het volgende ingangsgegeven een eigennaam is of niet. Als het door de aanwijzer aangewezen ingangsgegeven aan het eind staat dan wordt het verschaffen van de standaard informatie beëindigd.It is judged whether the pointer input is end or not (4408) and if it is not end, the flowchart returns to step 4402 and judged whether the next entry is proper or not. When the input data designated by the pointer is at the end, the provision of the standard information is terminated.

Na beëindiging van het verschaffen van de standaard type-informatie aan de eigennaam worden de gegevens geregistreerd in de woorden-35 lijstinformatiebehoudtabel 4036a uitgevoerd vanuit de verwerkingssectie 4036 naar de syntax analysesectie 4036 (4124), waarmee de morfologische analyse in deze uitvoeringsvorm is voltooid.After termination of providing the standard type information to the proper name, the data recorded in the words list information retention table 4036a is output from the processing section 4036 to the syntax analysis section 4036 (4124), thereby completing the morphological analysis in this embodiment.

Met verwijzing naar een voorbeeld van een ingevoerde zin zal nu de werking van de boven besproken inrichting worden verklaard.With reference to an example of an entered sentence, the operation of the device discussed above will now be explained.

40 De verklaring zal worden gegeven voor het geval de ingangszin "In 8702353 i i i t 53 < λ> r t ingangsgegeven heeft in de woordenlijst 4020 en het een eigennaam is (4110) wordt de verwerking van een in de woordenlijst geregistreerde eigennaam uitgevoerd (4114). Daarna gaat het stroomschema verder naar figuur 33. Omdat "Station" in het voorafgaande gedeelte niet een niet 5 geregistreerde eigennaam (4200) is maar een geregistreerde eigen naam (4204) en omdat de type-informatie (plaats, groep) niet onbekend is (4106), en omdat verder "Mr." de type-informatie van "persoon" heeft en niet onbekend is (4208) wordt nagegaan of er een gemeenschappelijk gedeelte of niet is in de type-informatie tussen "Station" in het voor-10 afgaande gedeelte en "Mr." (4212). Omdat "Station" de informatie "plaats, groep" heeft, terwijl "Mr." de informatie "persoon" heeft en er geen gemeenschappelijke informatie aanwezig is wordt "Mr." alleen geregistreerd als een eigennaam met de type-informatie "persoon" (4216).40 The explanation will be given in case the input sentence "In 8702353 iiit 53 <λ> rt has input in the glossary 4020 and it is a proper name (4110) processing of a proper name registered in the glossary is performed (4114). the flowchart continues to figure 33. Because "Station" in the previous section is not a non-registered proper name (4200) but a registered proper name (4204) and because the type information (place, group) is not unknown (4106 ), and further because "Mr." has the type information of "person" and is not unknown (4208), it is checked whether there is a common part or not in the type information between "Station" in the preceding 10 section and "Mr." (4212) Because "Station" has the information "place, group", while "Mr." has the information "person" and there is no common information, "Mr." is only registered as a proper noun with the type information "person" (42 16).

15 Daarna keert het stroomschema terug naar figuur 32 en wordt de re ferent! ewoordenl ijst 4020 aangesproken op "Walter" (4106). Omdat er een ingangsgegeven voor "Walter" aanwezig is in de referent!ewoordenlijst 4020 (4108) en het een eigennaam (4110) is wordt de behandeling voor een in de woordenlijst geregistreerde eigennaam uitgevoerd (4114). Het 20 stroomschema gaat verder naar figuur 33. Omdat "Mr." in het voorafgaande deel niet een niet geregistreerde eigennaam is (4200) maar een geregistreerde eigennaam (4204), en omdat de type-informatie gelijk is aan "persoon" en niet onbekend is (4206) en omdat de type-informatie voor "Walter" die gelijk is aan "persoon, plaats, groep" eveneens niet onbe-25 kend is (4208) wordt gekeken naar een overeenkomst tussen de type-in-formaties (4122). Omdat er in beide gevallen type-informatie "persoon" aanwezig is wordt "Mr. Walter" tezamen geregistreerd als een enkel zelfstandig naamwoord met type-informatie "persoon" (4214).15 Then the flowchart returns to Figure 32 and becomes the reference! Glossary 4020 addressed on "Walter" (4106). Since an entry for "Walter" is present in the ref glossary 4020 (4108) and it is a proper name (4110), the treatment for a proper name registered in the glossary is performed (4114). The flow chart continues to Figure 33. Because "Mr." in the preceding part, it is not an unregistered proper name (4200) but a registered proper name (4204), and because the type information is "person" and not unknown (4206) and because the type information for "Walter" which is equal to "person, place, group" is also not unknown (4208), an agreement between the type-information (4122) is examined. Because in both cases type information "person" is present, "Mr. Walter" is registered together as a single noun with type information "person" (4214).

Vervolgens wordt op "met" gezocht in de referent!ewoordenlijst 30 4020 en omdat er een ingangsgegeven (4108) is en dit geen eigennaam is (4110) worden de uit de referentiewoordenlijst 4020 verkregen gegevens geregistreerd in de woordenlijstinformatiebehoudtabel 4036a (4112).Then, "with" is searched in the reference dictionary 30 4020 and because there is an entry (4108) and this is not a proper name (4110), the data obtained from the reference dictionary 4020 is registered in the dictionary information retention table 4036a (4112).

Verder wordt gezocht op "Johnson" in de referentiewoordenlijst 4020 (4106). Omdat er geen ingangsgegeven voor "Johnson" is (4108) en 35 omdat het eerste karakter een hoofdletter is (4116) wordt een behandeling voor niet in de woordenlijst geregistreerde eigennaam uitgevoerd (4120). Daarna gaat het stroomschema verder naar figuur 34. Omdat "met" in het voorafgaande gedeelte geen kandidaat is voor het einde van de zin (4300) wordt geoordeeld dat "met" niet het einde van de zin is 40 (4312), "Johnson" wordt beschouwd als een eigennaam met onbekende type- 8702358 ί £ 54 ai informatie (4314) en er wordt een behandeling uitgevoerd voor een in de woordenlijst geregistreerde eigennaam (4316). Daarna gaat het stroomschema verder naar figuur 33. Omdat "met" in het voorafgaande gedeelte geen niet geregistreerde eigennaam (4200) en ook geen geregistreerde 5 eigennaam (4204) is wordt "Johnson" afzonderlijk geregistreerd als eigennaam waarvan de type-informatie onbekend is.Furthermore, "Johnson" is searched in reference glossary 4020 (4106). Since there is no entry for "Johnson" (4108) and 35 because the first character is uppercase (4116), treatment for glossary proper noun is performed (4120). Then the flowchart continues to Figure 34. Since "with" in the preceding section is not a candidate for the end of the sentence (4300), it is judged that "with" is not the end of the sentence 40 (4312), "Johnson" is considered a proper name with unknown type 8702358 £ 54 ai information (4314) and treatment is being performed for a proper name registered in the glossary (4316). Then, the flowchart proceeds to Figure 33. Since "with" is not an unregistered proper name (4200) nor a registered proper name (4204) in the previous section, "Johnson" is separately registered as a proper name whose type information is unknown.

Na de bovenstaande bewerkingen wordt de standaard type-informatie verschaft aan de eigennaam zoals getoond is in figuur 35.After the above operations, the standard type information is provided to the proper name as shown in Figure 35.

De aanwijzer wordt ingesteld op "In", zijnde de eerste woorden-10 lijstverwijzingseenheid (4400). Omdat dit geen eigennaam is (4402) wordt de aanwijzer verder bewogen (4408) en ingesteld op "Tokyo Station". Omdat "Tokyo Station" een eigennaam is (4402) en de type-informatie niet onbekend is omdat het gehele gedeelte "Tokyo Station" is herkend als een plaats, groep in de voorafgaande bewerking voor de ge-15 registreerde eigennaam (4404) wordt de aanwijzer verder bewogen (4408) en ingesteld op "Mr. Walter".The pointer is set to "In" being the first words-10 list reference unit (4400). Since this is not a proper name (4402), the pointer is moved further (4408) and set to "Tokyo Station". Since "Tokyo Station" is a proper name (4402) and the type information is not unknown because the entire "Tokyo Station" section has been recognized as a place, group in the previous operation for the registered proper name (4404) becomes the pointer moved further (4408) and set to "Mr. Walter".

Omdat "Mr. Walter" ook een eigennaam is (4402) en omdat de type-informatie niet onbekend is (4404) wordt de aanwijzer verder bewogen (4408). Omdat "met" geen eigennaam is (4402) wordt de aanwijzer verder 20 bewogen (4408). Omdat "Johnson" een eigennaam is (4402) en omdat de type-informatie ervan onbekend is (4404) wordt standaard type-informatie verschaft (4406) en "Johnson" wordt voorzien van de type-informatie "persoon, plaats, groep, en anderen" zoals aangegeven is in figuur 36.Because "Mr. Walter" is also a proper name (4402) and because the type information is not unknown (4404), the pointer is moved further (4408). Since "with" is not a proper name (4402), the pointer is moved further (4408). Because "Johnson" is a proper name (4402) and because its type information is unknown (4404), standard type information is provided (4406) and "Johnson" is provided with the type information "person, place, group, and others "as shown in figure 36.

Zoals in het bovenstaande is beschreven wordt in deze uitvoerings-25 vorm een Engelse ingangszin verdeeld in terugwinsleutelkarakterarray's, waarop dan wordt gezocht in de referentiewoordenlijst 4020 en indien er een ingangsgegeven als eigennaam aanwezig is in de referentiewoordenlijst 4020 wordt de verwerking voor een geregistreerde eigennaam uitgevoerd. Bij het verwerken van de geregistreerde eigennaam wordt de voor-30 afgaande terugwinsleutelkarakterarray in beschouwing genomen en indien de voorafgaande terugwinsleutelkarakterarray een eigennaam is wordt de type-informatie van de voorafgaande terugwinsleutelkarakterarray en van de eigennaam als object onderzocht. Als er geen type-informatie is dan wordt andere type-informatie verschaft, terwijl indien er enige type-35 informatie aanwezig is voor beiden het gemeenschappelijke gedeelte wordt beschouwd als de type-informatie voor deze eigennamen. Het is derhalve moge!ijk om op correcte wijze een eigennaam zonder type-informatie te voorzien van geschikte type-informatie, alsmede op de juiste wijze verschafte type-informatie te beperken tot meer geschikte type-40 informatie. Dit maakt een meer effectieve analyse mogelijk in de navol- 8702359 55 < 4» f t gende syntax-analyse en leidt tot een juiste vertaling.As described above, in this embodiment, an English input sentence is divided into recovery key character arrays, which are then searched in the reference dictionary 4020 and if there is an entry as proper name in the reference dictionary 4020, processing for a registered proper name is performed. When processing the registered proper name, the preceding recovery key character array is considered, and if the preceding recovery key character array is a proper name, the type information of the previous recovery key character array and of the proper name as object is examined. If there is no type information then other type information is provided, while if any type information is present for both of them the common part is considered the type information for these proper names. It is therefore possible to appropriately provide a proper name without type information with suitable type information, as well as appropriately limiting type information provided to more suitable type-40 information. This allows a more effective analysis in the following syntax analysis and leads to a correct translation.

Als verder het eerste karakter van een niet in de referentiewoor-denlijst 4020 geregistreerde karakterarray een hoofdletter is en de voorafgaande karakterarray is beoordeeld als zijnde het einde van de 5 zin, en omdat de hoofdletter wordt omgevormd in een kleine letter en daarmee de referent!ewoordenl ijst 4020 opnieuw is aangesproken is het mogelijk ook te zoeken op de karakterarray aan het begin van de zin in de woordenlijst 4020. Als verder een karakterarray die begint met een hoofdletter optreedt in een gedeelte dat niet het begin van de zin is, 10 dan wordt dit beoordeeld als een eigennaam en wordt de type-informatie voor de eigennaam verschaft door middel van een eigennaam met geregistreerde type-informatie indien deze daarvoor of daarna optreedt. Een niet in de referentiewoordenlijst 4020 geregistreerde eigennaam kan derhalve in zekere mate worden geanalyseerd.Furthermore, if the first character of a character array not registered in reference dictionary 4020 is uppercase and the preceding character array has been judged to be the end of the 5 sentence, and because the uppercase letter is converted to a lowercase letter and thus the referent. if list 4020 has been addressed again, it is also possible to search for the character array at the beginning of the sentence in the dictionary 4020. If, furthermore, a character array starting with a capital letter occurs in a part that is not the beginning of the sentence, then 10 this is judged as a proper name and the proper name type information is provided by means of a proper name with registered type information if it occurs before or after. Therefore, a proper noun not registered in the reference dictionary 4020 can be analyzed to some extent.

15 Omdat verder een niet van type-informatie voorziene eigennaam wordt voorzien van alle noodzakelijke type-informatie en de niet benodigde type-informatie tijdens de verwerking van het woord wordt verwijderd is het mogelijk om een eigennaam, waarvan de type-informatie niet bekend is of een niet geregistreerde eigennaam te analyseren.Furthermore, because a proper name not provided with type information is provided with all necessary type information and the unnecessary type information is removed during the processing of the word, it is possible to use a proper name, the type information of which is not known or analyze an unregistered proper name.

20 Omdat meerdere type-informaties worden toegevoegd aan een bepaalde eigennaam en omdat geschikte type-informaties worden geselecteerd afhankelijk van de type-informatie van de eigennaam ervoor of erna is het mogelijk om geschikte type-informaties te selecteren tijdens het analyseren van een eigennaam met diverse soorten type-informaties afhanke-25 lijk van de relatie met anderen ervoor of erna zodat een effectieve analyse van de ingangszin mogelijk wordt.Because multiple type information is added to a given proper name and because suitable type information is selected depending on the type information of the proper name before or after it is possible to select suitable type information while analyzing a proper name with various types of type information depend on the relationship with others before or after, so that an effective analysis of the input sentence is possible.

In het volgende zal een vijfde uitvoeringsvorm van de onderhavige uitvinding worden verklaard.In the following, a fifth embodiment of the present invention will be explained.

Figuur 38 illustreert de gehele structuur van de vijfde uitvoe-30 ringsvorm van de taal analyse inrichting volgens de onderhavige uitvinding, toegepast bij een inrichting voor het automatisch vertalen van Engels naar Japans.Figure 38 illustrates the entire structure of the fifth embodiment of the language analyzer of the present invention used in an automatic translation machine from English to Japanese.

Deze uitvoeringsvorm omvat een invoersectie 5010 en een in het Japans te vertalen Engelse tekst 5012 wordt via deze sectie ingevoerd. De 35 invoersectie 5010 kan bijvoorbeeld voorzien zijn van een toetsenbord met karaktertoetsen zoals alfanumerieke toetsen of functietoetsen, een optische karakterlezer (OCR-lezer) voor het lezen van een op papier geregistreerde Engelse tekst en/of een bestandsgeheugeninrichting voor het lezen van een Engelse tekst die op een geheugenmedium zoals een 40 magnetisch geheugenmedium is geregistreerd.This embodiment includes an input section 5010 and an English text 5012 to be translated into Japanese is entered through this section. For example, the input section 5010 may include a keyboard with character keys such as alphanumeric keys or function keys, an optical character reader (OCR reader) for reading an English text recorded on paper, and / or a file memory device for reading an English text is recorded on a memory medium such as a 40 magnetic memory medium.

6702358 56 y y y £6702358 56 y y y £

De vanuit de invoersectie 5010 ingevoerde Engelse tekst wordt ingelezen in een voorredigeersectie 5014 waarin een voorbehandeling voor de vertaling wordt uitgevoerd. In dit geval worden in hoofdzaak zins-herkenning en onbekende woordverwerking uitgevoerd. Dit functioneert 5 als deel van de morfologische analyse.The English text entered from the input section 5010 is read into a pre-editing section 5014 in which translation pretreatment is performed. In this case, mainly sentence recognition and unknown word processing are performed. This functions as part of the morphological analysis.

De voorgeredigeerde Engelse gegevens worden tezamen met de tijdens de voorredigering verkregen informatie overgedragen naar een morfologische analysesectie 5016. De sectie 5016 verdeelt de gegevens in zinnen waarbij wordt gezocht in een woordenlijst 5018, analyseert Engelse mor-10 femen, verwerkt onbekende woorden, eigennamen, diverse rangschikkingen zoals tijdsuitdrukkingen en getalsuitdrukkingen, en voert bewerkingen uit op de gehele zin zoals het zoeken naar vaste uitdrukkinghen en gezegdes. De morfologische analyseregels zijn opgeslagen in het analyseregel bestand 5036.The pre-edited English data, together with the information obtained during the pre-editing, is transferred to a morphological analysis section 5016. Section 5016 divides the data into sentences searching in a glossary 5018, analyzes English mor-10 femen, processes unknown words, proper names, various arrangements such as time expressions and number expressions, and performs operations on the entire sentence such as searching for fixed expressions and sayings. The morphological analysis rules are stored in the analysis rule file 5036.

15 De Engelse gegevens na de morfologische analyse worden tezamen met de woorden!ijstinformatie verkregen tijdens de analyse overgedragen naar een analysesectie I 5020. De analysesectie I 5020 is een functionele sectie die de oppervlaktestructuur van de zin analyseert door toepassing van grammaticale regels op de Engelse gegevens en die alle 20 structurele mogelijkheden opspoort.15 The English data after the morphological analysis is transferred together with the words icy information obtained during the analysis to an analysis section I 5020. The analysis section I 5020 is a functional section that analyzes the surface structure of the sentence by applying grammatical rules to the English data and that traces all 20 structural possibilities.

De Engelse gegevens, die in de analysesectie 5020 aan structuuranalyse zijn onderworpen worden tezamen met de geanalyseerde informatie toegezonden aan de analysesectie II 5022, waar een oplossing wordt gekozen uit het resultaat van de structuuranalyse van de oppervlakte-25 structuur in de analysesectie I door toepassing van een syntax analyse.The English data subjected to structure analysis in analysis section 5020 is sent together with the analyzed information to analysis section II 5022, where a solution is selected from the result of the structure analysis of the surface structure in analysis section I by applying a syntax analysis.

Een plausibele analysestructuur voor de Engelse zin wordt op deze wijze voorbereid en de structuur ervan wordt gevormd. De syntax analyseregels zijn eveneens opgeslagen in het analyseregelbestand 5036.A plausible analysis structure for the English sentence is prepared in this way and its structure is formed. The analysis rules syntax are also stored in the analysis rule file 5036.

De Engelse gegevens worden, na onderwerping aan de syntax analyse, 30 overgedragen als syntaxstructuurgegevens aan een structuurtransforma-tiesectie 5024. In de structuurtransformatiesectie 5024 wordt een syn-taxstructuur voor de corresponderende Japanse zin voorbereid vanuit de syntaxstructuur die een tussenliggende structuur is van de Engelse zin en die omgevormd is naar de Japanse onderliggende structuur van waaruit 35 de Japanse zin met gemak kan worden vertaald.The English data, after submission to the syntax analysis, is transferred as syntax structure data to a structure transformation section 5024. In the structure transformation section 5024, a syn-tax structure for the corresponding Japanese sentence is prepared from the syntax structure which is an intermediate structure of the English sentence and which has been transformed into the Japanese underlying structure, from which the Japanese sentence can be easily translated.

De gegevens voor de syntaxstructuur die de Japanse onderliggende structuur aangeeft en die op deze wijze aan de structuurtransformatie is onderworpen wordt overgedragen naar een vertaling vormende sectie 5026 waarin een vertaalde zin wordt gegenereerd. Dit is een functionele 40 sectie die een Japanse zin genereert uit de Japanse structuur.The syntax structure data indicating the Japanese underlying structure and thus subjected to the structure transformation is transferred to a translation forming section 5026 in which a translated sentence is generated. This is a functional 40 section that generates a Japanese sentence from the Japanese structure.

8702 35 9 57 A *8702 35 9 57 A *

1 V1 V

De Japsanse gegevens die resulteren uit de vorming van de vertaling, dat wil zeggen de gegevens van de vertaalde zin, worden toegezonden aan een naredigeersectie 5030. De naredigeersectie 5030 modificeert de vertaalde gegevens door terugwinnen van een woordenlijst 5018 ge-5 bruikmakend van informatie die is verkregen in het vertaalproces teneinde een meer natuurlijke Japanse zin te voltooien. De gegevens voor de Japanse zin worden overgedragen naar een uitvoersectie 5032 en daaraan afgegeven als vertaalde Japanse zin 5034. De uitvoersectie kan bijvoorbeeld voorzien zijn van een afdrukeenheid, een weergeefscherm of 10 een bestandsgeheugeninrichting zoals een magnetisch schijfgeheugen.The Japanese data resulting from the translation translation, that is, the data of the translated sentence, is sent to a post-digest section 5030. The post-digest section 5030 modifies the translated data by recovering a glossary 5018 using information that is obtained in the translation process in order to complete a more natural Japanese sentence. The Japanese sentence data is transferred to an output section 5032 and output to it as a translated Japanese sentence 5034. The output section may, for example, include a printer, a display screen or a file memory device such as a magnetic disk memory.

De doorstroming van een reeks van dergelijke vertaal behandelingen wordt bestuurd door een stuursectie 5018 waarmee de gehele inrichting wordt gecontroleerd.The flow of a series of such translation operations is controlled by a control section 5018 which controls the entire device.

De woordenlijst 5018 bevat woorden!ijstgegevens voor Engelse en 15 Japanse woorden en er zijn diverse informaties in opgenomen zoals onderlinge samenhang, dat wil zeggen een indicatie of woorden tezamen optreden, alsmede betekenissen, enkelvoudige of meervoudige vorm, zinsdeel, enz. naast de vocabulair zelf. Verder bevat het analyseregelbestand 5036 de gegevens van de morfologische analyseregels en syntacti-20 sche analyseregels.The glossary 5018 contains glossary data for English and 15 Japanese words and includes various information such as interrelation, that is, an indication of whether words occur together, as well as meanings, singular or plural form, phrase, etc. in addition to the vocabulary itself . Furthermore, the analysis rule file 5036 contains the data of the morphological analysis rules and syntactic analysis rules.

De stuursectie 5038 is verbonden met een operatieweergeefsectie 5040. De operatieweergeefsectie 5040 heeft operatietoetsen waarmee de operateur van de onderhavige inrichting diverse indicaties kan geven, bijvoorbeeld een vertaalindicatietoets, cursortoets, enz., en bevat een 25 weergeefeenheid of indicatiescherm waarmee de ingevoerde Engelse tekst, de Japanse tekst als resultaat van de vertaling, tussentijdse gegevens zoals woorden!ijstinformatie, diverse indicaties voor de operateur en dergelijke zichtbaar gemaakt kunnen worden. De weergeefsectie kan zodanig uitgevoerd zijn dat de meeste operatie-indicatiefuncties in het 30 toetsenbord zijn opgenomen indien positie wordt gekozen bij de ingangs-sectie 5010 of in een weergeefpaneel indien positie wordt gekozen bij de uitvoersectie 5032.The control section 5038 is connected to an operation display section 5040. The operation display section 5040 has operation keys with which the operator of the present device can give various indications, for example a translation indication key, cursor key, etc., and contains a display unit or indication screen with which the entered English text, the Japanese text as a result of the translation, intermediate data such as word list information, various indications for the operator and the like can be made visible. The display section may be configured so that most of the operation indication functions are included in the keyboard if position is selected at the input section 5010 or in a display panel if position is selected at the output section 5032.

Aan de hand van figuur 37 zal een gedetailleerde structuur bestemd voor verwerking van een eigennaam in de morfologische analysesectie 35 5016 als voorbeeld worden bekeken. Van de analysesectie 5016 is dat gedeelte geïllustreerd dat van direct belang is voor een beter begrip van de onderhavige uitvinding alhoewel er natuurlijk ook nog verdere functionele analysesecties zijn. De morfólogische analyse wordt uitgevoerd door het zoeken in de woordenlijst te instrueren vanaf het begin van de 40 ingangskarakterarray op successievelijke wijze in overeenstemming metReferring to Figure 37, a detailed structure intended for processing a proper name in the morphological analysis section 35 5016 will be considered as an example. Of the analysis section 5016, that portion is illustrated that is of direct importance for a better understanding of the present invention, although there are, of course, further functional analysis sections. The morphological analysis is performed by instructing the glossary search from the beginning of the 40 input character array in succession in accordance with

8702 35 S8702 35 S

i *e 58 de terugwinsleutelkarakterarray en de verwerking van de verkregen woorden! ijstinformat ie uit te voeren in overeenstemming daarmee door de woorden!ijstopzoeksectie 5104 aan de hand van de positionele informatie van dë eigennaam, zoals nog zal worden beschreven.i * e 58 the recovery key character array and the processing of the words obtained! To perform ice information in accordance therewith by the words ice lookup section 5104 on the basis of the positional information of the proper name, as will be described later.

5 De analysesectie 5016 heeft een invoerverwerkingssectie 5100 voor het ontvangen van gegevens van de invoerkarakterarray, ingevoerd vanaf de voorverwerkingssectie 5014 en voor het uitvoeren van de invoerbewer-king. De invoerbewerkingssectie 5100 is voorzien van een invoerkarak-terarraybuffer waarin de Engelse karakterarraygegevens in de vorm van 10 codegegevens worden ingevoerd, bijvoorbeeld in ASCII, en waarin de karakterarraygegevens tijdelijk worden geaccumuleerd.The analysis section 5016 has an input processing section 5100 for receiving data from the input character array input from the preprocessing section 5014 and performing the input processing. The input processing section 5100 includes an input character array buffer into which the English character array data is input in the form of code data, for example, in ASCII, and in which the character array data is temporarily accumulated.

De gegevens van de ingangskarakterarray die tijdelijk in de invoerbewerkingssectie 5100 zijn geaccumuleerd, worden overgedragen naar een eenheiduitsnijdingssectie 5012 waarin de gegevens worden verdeeld 15 in woorden!ijstverwijzingseenheden zoals woorden. De eenheiduitsnijdingssectie 5102 is een functionele sectie waarmee onderscheid wordt gemaakt tussen de woorden!ijstverwijzingseenheden die de terugwinsleu-telkarakterarray's vormen waarmee de woordenlijst 5018 in de woorden-lijstterugwinsectie 5104 vervolgens wordt aangesproken. De woorden-20 lijstreferentiebegrenzers die in de uitsnijdingsbewerking voor de woorden! ijstreferentie-eenheid worden gebruikt, worden geplaatst bij de positie van het Engelse karakter, het cijfer, de apostrof, karakters anders dan koppeltekens en rusttekens, alsmede bij en apostrof die volgt op een leeg karakter. Ze zijn opgeslagen in een begrenzerstabel 5108 en 25 daarnaar wordt tijdens het uitsnijden van de woorden!ijstverwijzings-eenheid in de eenheiduitsnijdingssectie 5102 verwezen.The input character array data temporarily accumulated in the input processing section 5100 is transferred to a unit excision section 5012 in which the data is divided into word reference units such as words. The unit excision section 5102 is a functional section that distinguishes between the word reference units that form the recovery key character arrays with which the glossary 5018 in the dictionary recovery section 5104 is subsequently addressed. The words -20 list reference limiters used in the excision operation for the words! ice reference units are placed at the position of the English character, number, apostrophe, characters other than hyphens and rest characters, as well as at and apostrophe following an empty character. They are stored in a limiter table 5108 and are referenced in the unit excision section 5102 during the cutting out of the words ice reference unit.

De referentiewoordenlijst 5018 bevat meer in het bijzonder de informatie voor het vervaardigen van de uitsnijdingen. Zoals bijvoorbeeld getoond is in figuur 38 voor het voorbeeld van een ingangsinformatie, 30 is grammaticale informatie zoals zinsdeel informatie aanwezig voor elk van de woorden!ijstreferentieeenheden. De zinsdeel informatie bevat voor het zelfstandige naamwoord een aanduiding of het gaat om een gewoon zelfstandig naamwoord of om een eigennaam. Voor de eigennaam geeft een onderscheidende indicatie aan op welke wijze de positie in de zin be-35 perkt is, dat wil zeggen er wordt voor een eigennaam positionele informatie opgeslagen. Dit zal later nog meer specifiek worden beschreven.More specifically, the reference glossary 5018 contains the information for making the cutouts. For example, as shown in Fig. 38 for the example of an input information, grammatical information such as phrase information is present for each of the words ice reference units. The phrase information contains an indication for the noun whether it is an ordinary noun or a proper noun. For the proper name, a distinctive indication indicates how the position in the sentence is limited, ie positional information is stored for a proper name. This will be described in more detail later.

Ook wordt andere informatie geregistreerd, bijvoorbeeld het telbaar of niet telbaar zijn van een zelfstandig naamwoord, het onderscheid tussen een overgankelijk of een niet overgankelijk werkwoord, de vertaling 40 daarvan, enz.Other information is also recorded, for example whether a noun is countable or not countable, the distinction between a transitive or a non-transitive verb, its translation, etc.

87023598702359

* A* A

5959

Er zijn vier typen positionele informatie voor een eigennaam, dat wil zeggen de patronen "0" - "3" in de onderhavige uitvoeringsvorm. Het patroon "0" wijst op een eigennaam zonder positionele restrictie, bijvoorbeeld "City" of de naam van een persoon "Walter". Het patroon "1" 5 geeft aan dat de eigennaam, bijvoorbeeld "Mr." zich bevindt voor een enkele eigennaam of een reeks van eigennamen, dat wil zeggen een eigennaam gerangschikt in een enkelvoudige groep van eigennamen. Het patroon "2" geeft aan dat het een eigennaam is zoals bijvoorbeeld "Station" of "Bay" gesitueerd aan het einde van een enkele eigennaam, of een eigen-10 naam die gerangschikt is als een woord in een groep van eigennamen en die verschilt van het nog te beschrijven patroon "3". Het patroon "3" geeft aan dat het gaat om een eigennaam zoals bijvoorbeeld "River" in "the Sumida River", hetgeen hetzelfde is als patroon "2" maar behoort bij een gedefinieerd voorzetsel "the" aan het begin van de eigennaam 15 gerangschikt in een groep van eigennamen.There are four types of positional information for a proper name, i.e. the patterns "0" - "3" in the present embodiment. The pattern "0" indicates a proper noun without a positional restriction, for example "City" or the name of a person "Walter". The pattern "1" 5 indicates that the proper name, for example "Mr." stands for a single proper noun or a series of proper nouns, that is, a proper noun arranged in a singular group of proper nouns. The pattern "2" indicates that it is a proper name such as "Station" or "Bay" located at the end of a single proper name, or a proper-10 name that is arranged as a word in a group of proper names and that differs of the pattern "3" to be described. The pattern "3" indicates that it is a proper name such as "River" in "the Sumida River", which is the same as pattern "2" but belongs to a defined preposition "the" arranged at the beginning of the proper name 15 in a group of proper names.

De woorden!ijstopzoeksectie 5104 is een functionele sectie waarmee woordenlijstinformatie wordt verkregen door in de woordenlijst 5013 te zoeken gebaseerd op de terugwinsleutelkarakterarray ingevoerd vanuit de eenheiduitsnijdingssectie 5102 en die deze overdraagt aan de woorden-20 lijstinformatiebehoudtabel 5124, de positionele informatieverwerkings-sectie 5110 en de sectie 5112 waarin het einde van een voorafgaande zin wordt bepaald.The word list lookup section 5104 is a functional section that obtains glossary information by searching the word list 5013 based on the recovery key character array input from the unit excision section 5102 and transmitting it to the word list list retention table 5124, the positional information processing section 5110 and the section 5112 determining the end of a preceding sentence.

De verwerking gebaseerd op de patronen "0" - "3" in overeenstemming met de positionele informatie van de eigennamen, afgeleid uit de 25 woordenlijst 5018, worden uitgevoerd door de eigennaam verwerkingssec-ties 5114, 5116 en 5118. De verwerkingen van de eigennamen worden uitgevoerd voor het patroon 1 in de eigennaam verwerkingssectie 5114, voor de patronen 2 en 3 in de eigennaam verwerkingssectie 5116 en voor het patroon 0 in de eigennaam verwerkingssectie 5118 resp. 5114.The processing based on the patterns "0" - "3" in accordance with the positional information of the proper names, derived from the glossary 5018, are performed by the proper name processing sections 5114, 5116 and 5118. The processing of the proper names are for the pattern 1 in the proper name processing section 5114, for the patterns 2 and 3 in the proper name processing section 5116 and for the pattern 0 in the proper name processing section 5118, respectively. 5114.

30 In deze uitvoeringsvorm zijn de eigennamen collectief gerangschikt waarbij als sleutel gebruik wordt gemaakt van een woord dat een deel vormt van een groep van eigennamen gegroepeerd tot een enkele eigen-naamsuitdrukking en waarin bij de rangschikking een positionele restrictie optreedt. Zelfs als continu een aantal eigennamen optreedt dan 35 kunnen deze dus op de juiste wijze worden gerangschikt tezamen met de context ervan zonder dat ze op foutieve wijze altijd worden beschouwd als eenvoudigweg een enkele groep van eigennamen. De voor dit doel bestemde bewerkingen worden uitgevoerd in de eigennaam bewerkingssecties 5114, 5116 en 5118. Eigennamen zijn tot op zekere hoogte geregistreerd 40 in de referentiewoordenlijst 5018. Dergelijke in de woordenlijst gere- 8702355 % £ 60 gistreerde eigennamen worden onderworpen aan analyse in de positionele informatieverwerkingssectie 5110 en in de eigennaam bewerkingssecties 5114, 5116 en 5118. Ze vormen een bewerkingssectie voor in de woordenlijst geregistreerde eigennamen, de eigennamen die niet in de woorden-5 lijst 5018 zijn geregistreerd worden geanalyseerd in sectie 5112 waarin het einde van de voorafgaande zin wordt beoordeeld en in de sectie 5118 indien de eigennaam het patroon “O" bezit. Deze vormen de bewerkings-secties voor de niet in de woordenlijst geregistreerde eigennamen.In this embodiment, the proper names are arranged collectively using as a key a word that is part of a group of proper names grouped into a single proper name expression and in which a positional restriction occurs in the arrangement. Thus, even if a number of proper nouns occur continuously, these can be correctly arranged together with their context without being erroneously always considered simply a single group of proper nouns. The edits intended for this purpose are performed in the proper name edit sections 5114, 5116 and 5118. Proper names are registered to a certain extent 40 in the reference glossary 5018. Such proper names listed in the glossary are 8702355% £ 60 subject to analysis in the positional information processing section 5110 and in the proper noun editing sections 5114, 5116 and 5118. They form an editing section for glossaries registered nouns, the proper nouns not registered in the glossary 5018 are analyzed in section 5112 where the end of the preceding sentence is and in the section 5118 if the proper name has the pattern “O”. These are the edit sections for the proper names not registered in the glossary.

De bewerking van een eigennaam wordt uitgevoerd door de volgende 10 twee stappen. Allereerst wordt een eigennaam in de ingangskarakterarray herkend. In het geval van een woord, geregistreerd in de woordenlijst 5018 wordt dit gedaan doordat de eigennaam wordt geïndiceerd in de mor-feemactuatie-informatie ervan. Verder wordt dit in het geval van een niet in de woordenlijst 5018 geregistreerd woord gedaan doordat het ka-15 rakter aan het begin een Engelse hoofdletter is, bijvoorbeeld “John" of "U.S." enz.The operation of a proper name is performed by the following 10 two steps. First of all, a proper name is recognized in the input character array. In the case of a word registered in the glossary 5018, this is done because the proper name is indicated in its morph change information. Furthermore, in the case of a word not registered in the glossary 5018, this is done because the character at the beginning is an English capital letter, for example, "John" or "U.S." etc.

Een groep van eigennamen wordt vervolgens collectief gerangschikt om het gehele gedeelte te kunnen beschouwen als een enkele eigen naam.A group of proper names is then arranged collectively in order to consider the entire section as a single proper name.

Wanneer het is herkend als eigennaam uit de woordenlijstinformatie en 20 indien de volgende woordenlijstreferentie-eenheid ook een eigennaam is dan wordt het gehele gedeelte collectief gesynthetiseerd tot een enkele eigennaam.When it is recognized as a proper name from the glossary information and if the next word reference unit is also a proper name, the entire portion is collectively synthesized into a single proper name.

"M. Weber" wordt bijvoorbeeld in zijn geheel geanalyseerd als een enkele eigennaam. Het resultaat van de analyse vormt een kandidaat voor 25 groepering van de ideomatische uitdrukking met inbegrip van de eigen namen in de lokale analyse.For example, "M. Weber" is analyzed in its entirety as a single proper name. The result of the analysis is a candidate for grouping the ideomatic expression including the proper names in the local analysis.

Vervolgens wordt de benodigde lokale analyse uitgevoerd. In dit geval wordt een opeenvolging van analyse-eenheden geactueerd door de morfeemactuatie-informatie voor elk van de analyse-eenheden die collec-30 tief zijn gerangschikt in een analyse-eenheid gebaseerd op een lokale analyseregel, waarbij bijvoorbeeld een rangschikking naar soort wordt uitgevoerd. "Mr. Brown" bijvoorbeeld wordt collectief gerangschikt in "Brown shi". Verder worden woorden die een gedeelte van een gebiedsnaam 35 vormen ook tezamen gehouden. Bijvoorbeeld "Lake Biwa" wordt tezamen gerangschikt tot “Biwako". Op dezelfde wijze worden woorden die een deel van de groepsnaam vormen ook collectief gerangschikt. "Yale University" wordt bijvoorbeeld geanalyseerd als "Yale Daigaku".The necessary local analysis is then carried out. In this case, a sequence of analysis units is actuated by the morpheme actuation information for each of the analysis units collectively arranged in an analysis unit based on a local analysis rule, for example, sorting by type. For example, "Mr. Brown" is collectively ranked in "Brown shi". Furthermore, words that form part of an area name 35 are also held together. For example, "Lake Biwa" is arranged together into "Biwako." Likewise, words that are part of the group name are also arranged collectively. For example, "Yale University" is analyzed as "Yale Daigaku".

In het geval van de eigen naam "Mr. —" of "Lake —" is er voor 40 wat betreft de context altijd een einde aanwezig juist voorafgaand aan 8702359 i 61 r x dit woord. In de Engelse zin "With Tom Mr. Brown went to —" is derhalve een bepaald einde aanwezig tussen "Tom" en "Mr." voor wat betreft de context. Als derhalve "Tom Mr. Brown" is samengevoegd tot een enkele eigen naam dan is er een fout opgetreden in de navolgende analyse. De 5 eigennaam "University" wordt bijvoorbeeld altijd gevolgd door een einde. In de Engelse zin "At Yale University Tom is----", wordt herkend dat er een pauze aanwezig is tussen "University" en "Tom". In deze uitvoeringsvorm is informatie omtrent de positie, waar de respectievelijke eigennamen in een opeenvolging van eigennamen een positionele restric-10 tie ondergaan, opgeslagen in de woordenlijst 5018 als de bovenbeschreven positionele informatie, dat wil zeggen als de patronen "0" - "3".In the case of the proper name "Mr. -" or "Lake -", 40 contextually, there is always an ending just before 8702359 i 61 r x this word. In the English sentence "With Tom Mr. Brown went to -" there is therefore a certain ending between "Tom" and "Mr." as for the context. Therefore, if "Tom Mr. Brown" has been merged into a single own name, then an error has occurred in the following analysis. For example, the 5 proper noun "University" is always followed by an ending. In the English phrase "At Yale University Tom is ----", it is recognized that there is a pause between "University" and "Tom". In this embodiment, information about the position where the respective proper names undergo a positional restriction in a sequence of proper names is stored in the glossary 5018 as the above described positional information, i.e. as the patterns "0" - "3" .

De samengestelde configuratie wordt, gebruikmakend van deze positionele informaties, aangemaakt in de verwerkingssecties 5110, 5112, 5114, 5116 en 5118. De woordenlijstinformatie voor de ingangskarakterarray wordt 15 na de voltooiing van deze bwwerkingen opgeslagen in de herwonnen woordenlij stinformatiebuffer, dat wil zeggen de woordenlijstinformatiebe-houdtabel 5124.The composite configuration is created using processing positions 5110, 5112, 5114, 5116 and 5118 using these positional information. The glossary information for the input character array is stored in the retrieved glossary information buffer, ie, the glossary information buffer, after the completion of these operations. holding table 5124.

Het resultaat van de morfologische analyse wordt overgedragen vanaf de woordenlijstinformatiebehoudtabel 5124 naar de analysesectie I 20 5020.The result of the morphological analysis is transferred from the glossary information retention table 5124 to the analysis section I 20 5020.

De verwerking van de eigennaam positie-informatie wordt uitgevoerd via de sequentie die getoond is in figuur 40. De invoerverwerking wordt uitgevoerd door het ontvangen van de gegevens van de ingangskarakterarray in de invoerverwerkingssectie 5100 (5200). Daarna snijdt de een-25 heiduitsnijdingssectie 5102 de ingangskarakterarray in woordenlijstver-wijzingseenheden voor het aanspreken van de woordenlijst 5018 (5201).The processing of the proper name position information is performed through the sequence shown in Figure 40. The input processing is performed by receiving the data from the input character array in the input processing section 5100 (5200). Thereafter, the unit cut-out section 5102 cuts the input character array into glossary reference units to address the glossary 5018 (5201).

De woordenlijstopzoeksectie 5104 spreekt de woordenlijst 5018 in overeenstemming daarmee aan (5203), en indien er een woordenlijstingangsge-geven aanwezig is (5204) wordt het zinsdeel daarvan onderzocht (5205). 30 Als het zinsdeel geen eigennaam is dan wordt de verwerking van de eigennaam in deze uitvoeringsvorm niet uitgevoerd maar wordt de woordenlijst! nformatie geaccumuleerd in de woordenl ijstinformatiebehoudtabel 5124 (5206). Als het een eigennaam is dan wordt de behandeling voor een in de woordenlijst geregistreerde eigennaam 5207 uitgevoerd in de 35 positionele informatieverwerkingssectie 5110 en in de eigennaam verwerkingssecties 5114, 5116 en 5118. Als deze behandelingen zijn uitgevoerd tot aan de laatste positie van de zin, aangegeven door de gegevens voor de ingangskarakterarray (5202) dan wordt het resultaat van de morfologische analyse afgegeven aan de zinstructuuranalysesectie I 5020 40 (5210).The glossary lookup section 5104 addresses the glossary 5018 accordingly (5203), and if a glossary entry data is present (5204), the phrase thereof is examined (5205). If the phrase is not a proper noun, the processing of the proper name is not performed in this embodiment, but the glossary! information accumulated in the glossary information retention table 5124 (5206). If it is a proper name, then the treatment for a glossary-registered proper name 5207 is performed in the positional information processing section 5110 and in the proper name processing sections 5114, 5116 and 5118. If these treatments are performed up to the last position of the sentence, indicated by the data for the input character array (5202), the result of the morphological analysis is output to the sentence structure analysis section I 5020 40 (5210).

8702359 £ * 62£ 8702359 * 62

Als er als resultaat van het zoeken in de woordenlijst geen ingangsgegeven wordt gevonden in stap 5204 en als het element begint met een hoofdletter (5212) dan wordt het herkend als een eigennaam die niet in de woordenlijst is geregistreerd en behandeld als een niet in de 5 woordenlijst geregistreerde eigennaam in 5213, in de sectie 5112 die het voorafgaande deel beoordeelt en in de sectie 5118 die de eigennaam behandelt. Als het beginkarakter geen hoofdletter is dan wordt het woord, omdat het niet geregistreerd is in de woordenlijst 5018, opgeslagen in de woorden!ijstinformatiebehoudtabel 5124 als een niet gere-10 gistreerd woord (5214). De bewerking wordt voorgezet tot aan de laatste positie (5202).If no entry is found as a result of the dictionary search in step 5204 and if the element starts with a capital letter (5212) then it is recognized as a proper name not registered in the dictionary and treated as a non in the 5 glossary registered proper name in 5213, in section 5112 that reviews the preceding section and in section 5118 which covers the proper name. If the initial character is not capitalized, then the word, because it is not registered in the glossary 5018, is stored in the words list information retention table 5124 as an unregistered word (5214). The operation is continued to the last position (5202).

De behandeling van in de woordenlijst geregistreerde eigennamen 5207 wordt uitgevoerd in de verwerkingssecties 5110, 5114, 5116 en 5118 in het stroomschema dat getoond is in figuur 41. Allereerst wordt ver-15 wezen naar de verkregen positionele informatie aanwezig in de woordenlijst informatie (5220). Daarna wordt de eigennaam verwerking 5221 voor het patroon 0 uitgevoerd indien het patroon "0" wordt geïndiceerd, de eigennaam verwerking 5222 wordt uitgevoerd voor het patroon 1 indien het patroon "1" wordt geïndiceerd en de eigennaam verwerking 5223 wordt 20 uitgevoerd voor de patronen 2, 3 indien de patronen "2" resp. "3" zijn geïndiceerd.The treatment of glossaries registered in the glossary 5207 is performed in processing sections 5110, 5114, 5116 and 5118 in the flow chart shown in Figure 41. First, reference is made to the obtained positional information contained in the glossary information (5220) . Thereafter, the proper name processing 5221 for the pattern 0 is performed if the pattern "0" is indicated, the proper name processing 5222 is performed for the pattern 1 if the pattern "1" is indicated, and the proper name processing 5223 is performed for the patterns 2 , 3 if the patterns "2" resp. "3" are indicated.

De eigennaam verwerking 5221 voor het patroon 0 wordt uitgevoerd in de verwerkingssectie 5114. De verwerking wordt toegepast op een eigennaam zonder positionele restrictie. Als om te beginnen een gedeel-25 te voorafgaand aan de woorden!ijstverwijzingseenheid in kwestie een niet geregistreerde eigennaam is (5230) dan wordt het gehele gedeelte tezamen gevoegd tot een enkele eigennaam met als eigennaam positie-in-formatie een "1" en opgeslagen in de woorden!ijstbehoudtabel 5124 (5233). Als het voorafgaande deel een eigennaam is met positionele in-30 formatie pos “1" (5231) dan wordt de behandeling op dezelfde wijze uitgevoerd.The proper name processing 5221 for the pattern 0 is performed in the processing section 5114. The processing is applied to a proper name without positional restriction. To begin with, if a part-25 prior to the words list reference unit in question is an unregistered proper name (5230), then the entire part is merged into a single proper name with the proper position name information "1" and stored in the words! list preservation table 5124 (5233). If the preceding part is a proper name with positional information pos "1" (5231) then the treatment is performed in the same manner.

Als het voorafgaande deel een eigennaam is met positionele informatie “O" (5231) dan wordt het gehele gedeelte tezamen gevoegd tot een enkele eigennaam met een positionele informatie van "O" en opgeslagen 35 in de woorden!ijstinformatiebehoudtabel 5124 (5235). Als verder het voorafgaande gedeelte geen eigennaam is met positionele informatie "O", dan wordt het gehele gedeelte alleen opgeslagen als een eigennaam met positionele informatie "0" in de woordenlijstinformatiebehoudtabel 5124 (5134).If the preceding part is a proper name with positional information "O" (5231) then the whole part is merged into a single proper name with positional information of "O" and stored in the words! List information retention table 5124 (5235). the preceding part is not a proper name with positional information "0", then the entire part is stored only as a proper name with positional information "0" in the glossary information retention table 5124 (5134).

40 De eigennaam verwerking 5222 voor het patroon 1 wordt op de onder- 8702359 6340 The proper name processing 5222 for the pattern 1 is shown at 8702359 63

* X* X

T t staand beschreven wijze uitgevoerd. Deze verwerking wordt toegepast op een eigennaam zoals bijvoorbeeld "Mr.11 gesitueerd aan het begin van een enkele eigennaam of aan het begin van een eigennaam die behoort tot een groep in een opeenvolgende reeks van meerdere eigennamen. Als om te be-5 ginnen het gedeelte voorafgaand aan de woordenlijstverwijzingseenheid in kwestie een niet geregistreerde eigennaam is (5240), dan wordt het woord omgevormd tot een niet geregistreerd woord (5241). Als het gaat om een niet geregistreerde eigennaam dan wordt het woord alleen opgeborgen als eigen naam met de positionele informatie pos "1" in de woor-10 denlijstinformatiebehoudtabel 5124 (5242).T t described in the manner described. This processing is applied to a proper name such as "Mr.11 situated at the beginning of a single proper name or at the beginning of a proper name belonging to a group in a consecutive series of multiple proper names. As to start the part prior to the glossary reference unit in question is an unregistered proper name (5240), then the word is transformed into an unregistered word (5241) If it is an unregistered proper name, the word is stored only as an own name with the positional information pos "1" in the glossary information retention table 5124 (5242).

Nu zal een verklaring worden gegeven van de eigennaam verwerking 5223 voor de patronen 2, 3 met verwijzing naar figuur 44. De verwerking wordt bijvoorbeeld uitgevoerd op eigennamen zoals "Station" of "River" die zich bevinden aan het einde van een enkele eigennaam of die als 15 eigennaam behoren tot een reeks van opeenvolgende meerdere eigen namen. Als allereerst het gedeelte dat voorafgaand aan de woordenlijstverwijzingseenheid in kwestie een niet geregistreerde eigennaam is (5250) dan worden ze tezamen gerangschikt met het voorafgaande woorden in een enkele eigennaam met de positionele informatie pos-self die daarbij be-20 hoort als de positionele eigennaamsinformatie pos en opgeslagen in de woorden!ijstinformatiebehoudtabel 5124 (5225). Verder wordt de behandeling op dezelfde wijze uitgevoerd als het voorafgaande deel een enkele eigennaam is met positionele informatie "1" (5251).Now an explanation will be given of the proper name processing 5223 for the patterns 2, 3 with reference to Figure 44. The processing is performed, for example, on proper names such as "Station" or "River" which are located at the end of a single proper name or which as 15 proper names belong to a series of consecutive multiple proper names. First of all, if the part that is an unregistered proper name (5250) prior to the glossary reference unit in question, then they are arranged together with the preceding words in a single proper name with the positional information pos-self associated therewith as the positional proper name information pos and stored in the words list information retention table 5124 (5225). Furthermore, the treatment is performed in the same manner as the preceding part is a single proper name with positional information "1" (5251).

Als het voorafgaande deel geen eigen naam is met positionele in-25 formatie "0" (5252) dan wordt het tezamen gevoegd tot een eigen naams-uitdrukking met de bijbehorende positionele informatie pos-self en opgeslagen in de woordenlijstinformatiebehoudtabel 5124 (5257).If the preceding part is not a custom name with positional information "0" (5252) then it is concatenated into a custom name expression with the associated positional information pos-self and stored in the glossary information retention table 5124 (5257).

Als in stap 5252 het voorafgaande gedeelte een eigennaam is met positionele informatie "0", dan wordt de bijbehorende positionele 30 eigennaamsinformatie pos-self gecontroleerd (5253) en de behandeling 5255 wordt uitgevoerd indien het patroon gelijk is aan "2". Als de eigen positionele eigennaamsinformatie pos-self gelijk is aan het patroon "3" dan wordt verder gecontroleerd of het element voorafgaand aan de woordenlijstverwijzingseenheid gelijk is aan "the" of niet. Indien 35 het niet gaat om het gedefinieerde woord "the", dan wordt de behandeling 5255 uitgevoerd. Indien het wel gaat om "the", dan wordt de groep tussen "the" en het beschouwde element tezamen gevoegd tot een eigen-naamsuitdrukking met de positionele eigen naamsinformatie "3" en opgeborgen in de woordenlijstinformatiebehoudtabel 5124 (5256).In step 5252, the preceding portion is a proper name with positional information "0", then the corresponding positional proper name information is pos-self checked (5253) and the treatment 5255 is performed if the pattern is "2". If the own positional proper name information pos-self is equal to the pattern "3" then it is further checked whether the element prior to the glossary reference unit is equal to "the" or not. If it is not the defined word "the", the treatment 5255 is performed. If it is "the", then the group between "the" and the element under consideration is merged into an own name expression with the positional own name information "3" and stored in the glossary information retention table 5124 (5256).

40 Voor een woord dat begint met een hoofdletter en wordt herkend als 8702358 64 * * ' *· 5 een niet geregistreerd woord waarvoor geen ingangsgegeven wordt gevonden in de referentiewoordenlijst 5018 als resultaat van het zoekproces in de woordenlijst 5203, wordt het stroomschema voortgezet via de stappen 5204 en 5212 met de behandeling 5213 die uitgevoerd wordt in de 5 sectie 5112 waarin einde van de voorafgaande zin wordt beoordeeld. Als allereerst het voorafgaande gedeelte van de woordenlijstverwijzingseen-heid in kwestie geen kandidaat is voor het einde van een zin, dan wordt de eigennaamsverwerking 5221 voor het patroon 0 op de bovenbeschreven wijze uitgevoerd in de verwerkingssectie 5118.40 For a word that starts with a capital letter and is recognized as 8702358 64 * * '* · 5 an unregistered word for which no entry is found in the reference glossary 5018 as a result of the glossary search 5203, the flow chart continues through the steps 5204 and 5212 with treatment 5213 performed in section 5112 in which end of the preceding sentence is judged. First of all, if the preceding portion of the glossary reference unit in question is not a candidate for the end of a sentence, the proper name processing 5221 for the pattern 0 is performed in the processing section 5118 as described above.

10 Het voorafgaande gedeelte kan kandidaat staan voor het einde van een zin in de volgende vier gevallen. Het eerste is het geval waarin een afzonderlijke aanwezig is. Vervolgens is er het geval waarin het voorafgaande gegeven aan het einde voorzien is van een punt en de positionele informatie van de eigennaam niet gelijk is aan "1". Dit ge-15 val omvat bijvoorbeeld een afkorting "U.S.A.". Verder is er het geval van een of de opeenvolging van een punt en een apostrof ". en de opeenvolging van een punt en een aanhalingsteken Het laatste geval is dat waarin het gaat om het begin van de ingangskarakterar-raybuffer.10 The preceding section may run for the end of a sentence in the following four cases. The first is the case where a separate one is present. Then there is the case where the preceding data has a dot at the end and the positional information of the proper noun is not "1". This case includes, for example, an abbreviation "U.S.A.". Furthermore, there is the case of one or the sequence of a point and an apostrophe ". And the sequence of a point and a quotation mark. The last case is that of the beginning of the input character array buffer.

20 Als het voorafgaande gedeelte behoort tot willekeurig een van de bovengenoemde vier gevallen dan wordt het voorafgaande gedeelte herkend als kandidaat voor het einde van de zin (5261) en wordt opnieuw de woordenlijst aangesproken na het omvormen van de hoofdletter van het woord tot een kleine letter (5262). Als resultaat van dit zoekproces 25 wordt, indien een woorden!ijstingangsgegeven wordt verkregen (5263) dit opgeborgen in de woorden!ijstbehoudtabel 5264 (5264). Indien dit niet het geval is dan wordt het vastgelegd als een niet geregistreerde eigennaam in de woorden!ijstbehoudtabel 5264 waarbij het beginkarakter ongewijzigd als hóófdletter blijft staan (5265).20 If the preceding part belongs to any of the above four cases, then the preceding part is recognized as a candidate for the end of the sentence (5261) and the word list is used again after converting the capital letter of the word to a lower case letter (5262). As a result of this search process, if a word entry entry data is obtained (5263), it is stored in the words entry retention table 5264 (5264). If not, it is recorded as an unregistered proper noun in the words! List retention table 5264 with the initial character unchanged as capital letter (5265).

30 Een verklaring zal worden gegeven aan de hand van een voorbeeld.30 An explanation will be given by example.

Als bijvoorbeeld de woordenlijst wordt aangesproken op een ingangska-rakterarray “Along the Sumida River Paul and Mr. Gold Smith went ——", dan wordt de woordenlijstingangsinformatie allereerst ingeschreven in de woorden!ijstinformatiebehoudtabel 5124 zoals getoond is in figuur 35 46A. Verwijzend bijvoorbeeld naar "the" is de beginpositie in de zin gelijk aan "7" en de eindpositie is "9" en als zinsdeel is dit woord een lidwoord. Er wordt geen ingangsgegeven gevonden voor het woord "Along" aan het begin van de ingangskarakterarray tijdens de woorden-lijstzoekprocedure 5203 en dit woord wordt derhalve als niet geregis-40 treerd beschouwd. Omdat het voorafgaande gedeelte kan voldoen aan de 8702359 65 , Λ Μ τ * voorwaarden voor de kandidatuur voor een zinseinde en het daarmee aan het begin van de ingangsbuffer (5260) kan staan wordt de hoofdletter "A" aan het begin omgevormd in een kleine letter en wordt opnieuw een zoekproces 5262 uitgevoerd in de woordenlijst op "along".For example, if the glossary is addressed on an input character array “Along the Sumida River Paul and Mr. Gold Smith get used —— ", then the glossary entry information is first written in the words! List information retention table 5124 as shown in Fig. 35 46A. For example, referring to" the ", the start position in the sentence is" 7 "and the end position is" 9 " "and as a phrase, this word is a definite article. No entry is found for the word" Along "at the beginning of the input character array during the glossary search 5203 and this word is therefore considered unregistered. Because the preceding portion can meet the 8702359 65, Λ Μ τ * prerequisites for a sentence ending and thus it can be at the beginning of the input buffer (5260), the capital "A" is converted to a lower case at the beginning and again a search process 5262 performed in the glossary on "along".

5 Daarna wordt de aanwijzer verplaatst om verder te gaan met de ver werking van "Sumida". Dit woord is niet in deze uitvoeringsvorm in de woordenlijst 5018 geregistreerd. Omdat het voorafgaande gedeelte geen kandidaat is voor het zinseinde gaat het stroomschema verder naar de eigennaamverwerking 5221 voor het patroon 0. Zoals in figuur 46A is ge-10 toond wordt een eigennaam voor het zinsdeel en "0" voor de positionele eigennaamsinformatie gevonden.5 Then the pointer is moved to continue processing "Sumida". This word is not registered in the glossary 5018 in this embodiment. Since the preceding portion is not a candidate for the sentence ending, the flowchart proceeds to the proper name processing 5221 for the pattern 0. As shown in Figure 46A, a proper name for the phrase and "0" for the positional proper name information are found.

De volgende woordenlijstverwijzingseenheid "River" is een eigennaam met positionele eigennaamsinfromatie "3". Het voorafgaande deel is een eigennaam met positionele eigennaamsinformatie "0" en het daaraan 15 voorafgaande deel is gelijk aan "the". Zoals in het bovenstaande uiteen gezet is wordt "the Sumida River" tezamen gevoegd tot een enkele eigennaam via de stappen 5250-5254 en met de positionele eigennaamsinformatie "3" opgeborgen in de woordenlijstinformatiebehoudtabel 5124 (figuur 46B).The following glossary reference unit "River" is a proper name with positional proper name information "3". The preceding part is a proper name with positional proper name information "0" and the preceding part is equal to "the". As explained above, "the Sumida River" is merged into a single proper name through steps 5250-5254 and stored with the positional proper name information "3" in the glossary information retention table 5124 (Figure 46B).

20 Daarna is de volgende woordenlijstverwijzingseenheid "Paul" een niet geregistreerde eigennaam met positionele eigennaamsinformatie "0" waarvoor de behandeling 5213 wordt uitgevoerd. Alhoewel het voorafgaande woord een eigennaam is wordt dit, omdat de positionele informatie daarvoor gelijk is aan "3" niet tezamen gevoegd maar wordt de woorden-25 lijstinformatie geaccumuleerd zoals ze is in de tabel 5124 (figuur 46C). De normale bewerking wordt uitgevoerd op het samenvoegsel "and" dat daarop volgt.Thereafter, the next glossary reference unit "Paul" is an unregistered proper name with positional proper name information "0" for which the treatment 5213 is performed. Although the preceding word is a proper name, since the positional information therefor is "3", it is not aggregated but the word list information is accumulated as it is in Table 5124 (Figure 46C). The normal operation is performed on the "and" suffix that follows.

Het volgende woord "Mr." is een eigennaam met positionele informatie "1", en deze wordt geaccumuleerd zoals ze is in de woordenlijstin-30 formatiebehoudtabel 5124 (figuur 46D). Ook al is het voorafgaande gedeelte een eigennaam "Paul", omdat er een scheiding tussen de woorden aanwezig is direct voorafgaand aan "Mr.", kan dit woord als zodanig behouden blijven in de woorden!ijstinformatiebehoudtabel 5124.The next word "Mr." is a proper name with positional information "1", and it is accumulated as it is in the glossary information retention table 5124 (Figure 46D). Although the preceding section is a proper name "Paul", because there is a division between the words immediately before "Mr.", this word can be preserved as such in the words list information retention table 5124.

Het woord "Gold" is verder een eigennaam die niet in de woorden-35 lijst is geregistreerd en daarop wordt de behandeling 5213 toegepast. Omdat het woord "Mr." dat daaraan vooraf gaat de positionele informatie "1" heeft worden beiden tezamen gevoegd en wordt het gehele gedeelte als een enkele eigennaam voorzien van de positionele informatie "1" (figuur 46E). Daarna wordt dezelfde behandeling uitgevoerd voor het 40 volgende woord "Smith" (figuur 46F). Het daarop volgende "went" is eenThe word "Gold" is further a proper name not registered in the glossary, and the treatment 5213 is applied thereto. Because the word "Mr." preceded by the positional information having "1" are both joined together and the entire portion is provided with the positional information "1" as a single proper name (Fig. 46E). Then, the same treatment is performed for the next word "Smith" (Figure 46F). The subsequent "used" is one

S70 2 35 SS70 2 35 S

66 > ? Z ij verleden tijdsvorm van een werkwoord en daarop wordt vervolgens de normal e analyse toegepast.66>? The past tense of a verb and then the normal analysis is applied.

Zoals in het bovenstaande voor de onderhavige uitvoeringsvorm is beschreven worden de eigennamen samengevoegd door als sleutel gebruik 5 te maken van een woord dat een deel vormt van een groep van eigennamen die tezamen tot een enkele eigennaam zijn gevormd en onderworpen zijn aan een positionele restrictie indien ze tot een enkele eigennaam zijn gecombineerd. Op deze wijze is het mogelijk, zelfs indien er een doorlopend aantal eigennamen optreedt, de juiste collectieve samenstelling 10 te vinden in de context zonder dat een foutieve collectieve samenstelling wordt verkregen eenvoudig door ze in groepen van eigennamen te verdelen. In het bovengenoemde voorbeeld wordt bijvoorbeeld "the Sumida River" geanalyseerd als een groep van eigennamen gescheiden van het daarop volgende "Paul". Verder wordt ook "Mr. Gold Smith" als groep van 15 bij elkaar behorende eigennamen geanalyseerd.As described above for the present embodiment, the proper names are concatenated using as key a word that is part of a group of proper names formed together into a single proper name and subject to a positional constraint if they are combined into a single proper name. In this way, even if a continuous number of proper names occurs, it is possible to find the correct collective composition in the context without obtaining an incorrect collective composition simply by dividing them into groups of proper names. For example, in the above example, "the Sumida River" is analyzed as a group of proper names separated from the subsequent "Paul". Furthermore, "Mr. Gold Smith" is also analyzed as a group of 15 associated proper nouns.

Vervolgens zal een verklaring worden gegeven van de zesde uitvoeringsvorm volgens de onderhavige uitvinding.Next, an explanation will be given of the sixth embodiment of the present invention.

Figuur 47 illustreert de zesde uitvoeringsvorm van de taal analyse inrichting volgens de onderhavige uitvinding toegepast voor het automa-20 tisch vertalen van Engels naar Japans.Figure 47 illustrates the sixth embodiment of the language analyzer according to the present invention used for automatic translation from English to Japanese.

De onderhavige uitvoeringsvorm heeft een invoerverwerkingssectie 6014 en gegevens worden in de invoerverwerkingssectie 6014 ingevoerd vanaf een invoereenheid 6012. De invoereenheid 6012 is bijvoorbeeld voorzien van een toetsenbord rnet karaktertoetsen zoals alfanumerieke 25 toetsen of functietoetsen, een optische karakter!ezer voor het lezen van Engelse tekst die op papier is geregistreerd en een üitleeseenheid van een magnetisch schijfgeheugen.The present embodiment has an input processing section 6014 and data is input into the input processing section 6014 from an input unit 6012. The input unit 6012 includes, for example, a keyboard with character keys such as alphanumeric keys or function keys, an optical character for reading English text. recorded on paper and a readout unit of a magnetic disk memory.

De invoerverwerkingssectie 6014 heeft een invoerkarakterarraybuf-fer 6014a en bevat de Engelse ingangszin die ingevoerd is vanaf de in-30 voereenheid 6012 in de invoerkarakterarraybuffer 6014a. De invoerverwerkingssectie 6014 leest de invoerzin, opgeslagen in de invoerkarakterarraybuffer 6014 uit en geeft deze af aan de eenheiduitsnijdingssec-tie 6016. De eenheiduitsnijdingssectie 6016 is een functionele sectie die de woorden!ijstverwijzingseenheden uitsnijdt uit.de ingangszin die 35 wordt ontvangen van de invoerverwerkingssectie 6014 met behulp van de begrenzerstabel 6018. De begrenzerstabel 6018 bevat de begrenzers zoals spatie, komma, enz. De eenheiduitsnijdingssectie 6016 leest de begrenzers uit de begrenzerstabel 6018 en verdeelt de van de invoerverwerkingssectie 6014 afkomstige zin in karakterarray's als eenheden voor 40 het aanspreken van een referentiewoordenlijst 6020 door de zin te ver- 870235» 67 4, J.The input processing section 6014 has an input character array buffer 6014a and contains the English input sentence input from the input unit 6012 into the input character array buffer 6014a. The input processing section 6014 reads out the input sentence stored in the input character array buffer 6014 and outputs it to the unit excision section 6016. The unit excision section 6016 is a functional section which cuts out the words list reference units from the input sentence received from the input processing section 6014 with using the limiter table 6018. The limiter table 6018 contains the limiter such as space, comma, etc. The unit excision section 6016 reads the limiter from the limiter table 6018 and divides the sentence from the input processing section 6014 into character arrays as units for 40 addressing a reference glossary 6020 by changing the sentence 870235 »67 4, J.

T TT T

delen in delen, daar waar begrenzers aanwezig zijn. De deelkarakterar-ray's worden ingevoerd in de woordenlijstopzoeksectie 6022.parts in parts, where limiters are present. The sub-character arrays are entered in the glossary lookup section 6022.

De woordenlijstopzoeksectie 6022 spreekt de referentiewoordenlijst 6020 aan met behulp van de ingangszin afkomstig van de eenheiduitsnij-5 dingssectie 6016 en verdeeld in de woordenlijstverwijzingseenheden. De referentiewoordenlijst 6020 bevat bijvoorbeeld, zoals getoond is in figuur 48, ingangsgegevens voor de karakterarray's, zinsdelen ervan, ty-pe-informatie, enz. uit de Engelse zin. De referentiewoordenlijst 6020 bevat bovendien de in de figuur getoonde eigennamen, die karakterar-10 ray's die andere zinsdelen vormen, bijvoorbeeld werkwoorden, bijvoeglijke naamwoorden, enz. De eigennaam betekent als zinsdeel in deze figuur die eigennaam die een geregistreerde en later te verwerken eigennaam is maar heeft geen betrekking op de gebruikelijke grammaticale eigennaam. Verder geeft de type-informatie aan wat de betreffende 15 eigennaam tot uitdrukking brengt en daarbij wordt geen restrictie tot een enkele uitdrukking gemaakt.The glossary lookup section 6022 addresses the reference glossary 6020 using the input phrase from the unit excision section 6016 and divided into the glossary reference units. For example, the reference glossary 6020 contains, as shown in Figure 48, input data for the character arrays, phrases thereof, type information, etc., from the English sentence. The reference dictionary 6020 additionally contains the proper names shown in the figure, which are character arrays that form other phrases, for example verbs, adjectives, etc. The proper name in this figure means that proper name which is a registered and later to be used proper name, but does not refer to the usual grammatical proper noun. Furthermore, the type information indicates what the proper noun expresses, and no restriction is made to a single expression.

De woordenlijstopzoeksectie 6022 zoekt in de referentiewoordenlijst 6020 op de in de woordenlijstverwijzingseenheden verdeelde karak-terarray en indien de karakterarray een eigennaam is dan wordt deze af-20 gegeven aan de eigennaamverwerkingssectie 6024 voor de nog later te beschrijven eigennaambehandeling. Indien het geen eigennaam is dan wordt ze afgegeven aan een verwerkingssectie 6036 en opgeslagen in de woorden! ijstinformatiebehoudtabel 6036a in de verwerkingssectie 6036.The glossary lookup section 6022 searches the reference glossary 6020 for the character array distributed in the glossary reference units and if the character array is a proper name, it is output to the proper name processing section 6024 for the proper proper name treatment to be described later. If not a proper name, it is issued to a processing section 6036 and stored in the words! ice information retention table 6036a in the processing section 6036.

De eigennaam verwerkingssectie 6024 omvat een sectie 6026 voor het 25 verwerken van het voorafgaande zinseinde, een sectie 6028 voor het verwerken van een voorafgaande eigennaam en een sectie 6030 voor het verwerken van een eigennaam als zodanig.The proper name processing section 6024 includes a section 6026 for processing the preceding sentence end, a section 6028 for processing a previous proper name, and a section 6030 for processing a proper name as such.

De sectie 6026 voor het verwerken van het voorafgaande zinsuitein-de beoordeelt of een karakterarray voorafgaand aan de ingevoerde karak-30 terarray, ontvangen van de woordenlijstopzoeksectie 6022 aan het eind staat van een zin of niet en indien de voorafgaande karakterarray aan het einde van de zin staat dan wordt ze aan de woordenlijstopzoeksectie 6022 toegezonden na omvorming van de hoofdletter aan het begin van de te verwerken karakterarray in een kleine letter waarna de woordenlijst-35 opzoeksectie 6022 de referentiewoordenlijst 6020 opnieuw aanspreekt. De karakterarray die niet bij de tweede poging wordt gevonden wordt beoordeeld als een niet geregistreerde eigennaam en toegezonden aan de verwerkingssectie 6036 en opgeslagen in de woorden!ijstinformatiebehoudtabel 6036a. Als verder de karakterarray die voorafgaat aan de ingevoerde 40 karakterarray niet aan het einde van een zin staat dan wordt ze toege- 8702350 68 .:» *· ï * voerd aan de verwerkingssectie 6036 als een eigennaam waarvan de type-informatie onbekend is en geregistreerd in de woordenlijstinformatiebe-houdtabel 6036a zoals nog later wordt beschreven.The preceding sentence ending section 6026 judges whether a character array prior to the input character 30 array received from the glossary lookup section 6022 is at the end of a sentence or not and if the preceding character array is at the end of the sentence then it is sent to the glossary lookup section 6022 after capitalizing the capital letter at the beginning of the character array to be processed, after which the glossary-lookup section 6022 again addresses the reference glossary 6020. The character array not found on the second attempt is judged as an unregistered proper name and sent to the processing section 6036 and stored in the words list information retention table 6036a. Furthermore, if the character array preceding the entered 40 character array is not at the end of a sentence, it is supplied 8702350 68: »* · ï * is entered to the processing section 6036 as a proper name whose type information is unknown and registered in the glossary information retention table 6036a as described later.

De sectie 6028 voor verwerking van de voorafgaande eigennaam ana-5 lyseert de type-informatie van de voorafgaande karakterarray, toegezonden vanuit de sectie 6026 voor het verwerken van het voorafgaande zins-einde en geeft het resultaat af aan de sectie 6036 voor het verwerken van de eigennaam als zodanig. De sectie 6030 van het verwerken van de eigennaam als zodanig controleert de type-informatie voor de te ontle-10 den eigennaam en indien de type-informatie niet geregistreerd is bij die van de eigennaam en van de voorafgaande eigennaam dan wordt, zoals later nog wordt beschreven de eigennaam met de voorafgaande eigennaam tezamen gevoegd door middel van de geregistreerde type-informatie van de ander en het resultaat wordt opgeslagen in de woordenlijstinforma-15 tiebehoudtabel 6036a in de verwerkingssectie 6036.The preceding proper name processing section 6028 analyzes the type information of the preceding character array transmitted from the preceding sentence processing section 6026 and outputs the result to the processing section 6036. proper noun as such. The section 6030 of the proper name processing as such checks the type information for the proper name to be decomposed and if the type information is not registered with that of the proper name and of the previous proper name then, as later described the proper name with the previous proper name merged together by the registered type information of the other and the result is stored in the glossary information retention table 6036a in the processing section 6036.

De verwerkingssectie 6036 is voorzien van de woordenlijstinforma-tiebehoudtabel 6036a en bevat de gegevens toegezonden vanaf de sectie 28 voor het verwerken van de voorafgaande eigennaam of de woordenlijst-opzoeksectie 6022 in de woordenlijstinformatiebehoudtabel 6036a en 20 leest dan de op deze wijze opgeborgen gegevens uit en geeft deze af aan de structuuranalysesectie 6038. De analysesectie 6038 zoekt de syntactische analyse uit op de ingangszin, uitgelezen uit de woordenlijstin-formatiebehoudtabel 6036a, en onderworpen aan de morfologische analyse.The processing section 6036 includes the glossary information retention table 6036a and contains the data transmitted from the section 28 for processing the previous proper name or the glossary lookup section 6022 in the glossary information retention table 6036a and 20 then reads out and stores the data stored in this manner these to the structure analysis section 6038. The analysis section 6038 searches the syntactic analysis on the input sentence, read from the glossary information retention table 6036a, and subjected to the morphological analysis.

De werking van de inrichting zal nu worden verklaard met verwijzing 25 naar het stroomschema dat geïllustreerd is in figuur 49.The operation of the device will now be explained with reference to the flow chart illustrated in Figure 49.

Om te beginnen wordt een Engelse ingangszin vanaf de invoereenheid 6012 ingelezen in de invoerverwerkingssectie 6014 (6100). De ingangzin, gelezen in de invoerverwerkingssectie 6014 wordt opgeslagen in de in-voerkarakterarraybuffer 6014a. De in de invoerkarakterarraybuffer 6014a 30 opgeborgen zin wordt uitgelezen in de eenheiduitsnijdingssectie 6016.To begin with, an English input sentence from the input unit 6012 is read into the input processing section 6014 (6100). The input sentence read in the input processing section 6014 is stored in the input character array buffer 6014a. The sense stored in the input character array buffer 6014a 30 is read in the unit cutout section 6016.

Als de ingangzin is ingevoerd dan leest de eenheiduitsnijdingssectie 6016 de begrenzers uit de begrenzingstabel 6018 teneinde het uitsnijden van de woordenlijstverwijzingseenheden (6102) uit te voeren. De karakterarray's die de ingevoerde ingangszin vormen worden successieve-35 lijk gedeeld vanaf het begin ervan in de terugwinsleutelkarakterarray‘s die de eenheden vormen waarmee de referentiewoordenlijst 6020 wordt aangesproken door te delen op de posities waar de begrenzers aanwezig zijn zoals spaties, komma's, enz. Er wordt beoordeeld of het indelen in woordenlijstverwijzingseenheden, d.w.z. de terugwinsleutelkarakterar-40 ray's, al dan niet is beëindigd (6104) en indien er nog een terugwin- 8702359 69 < -t T f sleutelkarakterarray aanwezig is (nog geen einde), dan wordt de terugwinsleutelkarakterarray toegezonden aan de lijstopzoeksectie 6022.Once the input sentence has been entered, the unit cut-out section 6016 reads the limiter from the boundary table 6018 to perform the cutting of the glossary reference units (6102). The character arrays constituting the input input sentence are successively shared from the beginning in the recovery key character arrays that form the units with which the reference glossary 6020 is addressed by dividing at the positions where the delimiters are present such as spaces, commas, etc. It is judged whether or not the termination into glossary reference units, i.e. the recovery key character-40 ray's, has been terminated (6104) and if there is still a recovery 8702359 69 <-t T f key character array present (no end), the recovery key character array sent to the list lookup section 6022.

Als de terugwinsleutelkarakterarray is toegezonden aan de woorden-1ijstopzoeksectie 6022 dan wordt door de woordenlijstopzoeksectie 6022 5 de referentiewoordenlijst 6020 aangesproken met behulp van de terugwinsleutelkarakterarray (6106). Er wordt beoordeeld of de terugwinsleutelkarakterarray al dan niet aanwezig is als ingangsgegeven in de referentiewoordenl ijst 6020 zoals getoond is in figuur 48 (6108) en indien er een ingangsgegeven aanwezig is dan wordt de in de referentiewoorden-10 lijst 6020 opgeslagen zinsdeel informatie uitgelezen en wordt beoordeeld of de terugwinsleutelkarakterarray een eigennaam is of niet (6110).If the recovery key character array is sent to the glossary lookup section 6022, then the glossary lookup section 6022 accesses the reference glossary 6020 using the recovery key character array (6106). It is judged whether or not the recovery key character array is present as input data in the reference word list 6020 as shown in Figure 48 (6108) and if an input data is present then the phrase information stored in the reference word list 6020 is read out and assessed whether the recovery key character array is a proper name or not (6110).

In het geval de terugwinsleutelkarakterarray geen eigennaam is dan zendt de woordenlijstopzoeksectie 6020 de uit de referentiewoordenlijst 6020 gelezen informatie naar de verwerkingsectie 6036 waarin ze wordt 15 opgeslagen in de woordenlijstinformatiebehoudtabel 6036a (6112). Als de data opgeslagen is in de woorden! ijstinformatiebehoudtabel 6036a dan wordt een kenmerk, aangevende dat de gegevens zijn opgeslagen met de gegevens van de direct daaraan voorafgaande terugwinsleutelkarakterarray ingevoerd vanaf de verwerkingssectie 6036 in de eenheiduitsnij-20 dingssectie 6016. Op deze wijze keert het stroomschema terug naar stap 6012 en wordt opnieuw een uitsnijding voor een woorden!ijstverwijzings-eenheid uitgevoerd in de eenheiduitsnijdingssectie 6016.In case the recovery key character array is not a proper name, the glossary lookup section 6020 sends the information read from the reference glossary 6020 to the processing section 6036 in which it is stored in the glossary information retention table 6036a (6112). If the data is stored in the words! ice information retention table 6036a, a characteristic indicating that the data is stored with the data of the immediately preceding recovery key character array is input from the processing section 6036 into the unit excision section 6016. In this manner, the flowchart returns to step 6012 and again becomes an excision. for a word list unit performed in the unit cutout section 6016.

Als in stap 6110 de terugwinsleutelkarakterarray een eigennaam is, dan zendt de woordenlijstopzoeksectie 6022 de uit de referentiewoorden-25 lijst 6020 gelezen eigennaam (in het volgende eenvoudig aangeduid als de eigennaam) tezamen met de gegevens van de voorafgaande terugwinsleutelkarakterarray, ingevoerd vanaf de woordenlijstinformatiebehoudtabel 6036a in de verwerkingssectie 6036 via de eenheidsuitsnijdingssectie 6016 naar de woordenlijstopzoeksectie 6022, en naar de eigennaam ver-30 werkingssectie 6024 waar de verwerking van de in de woordenlijst geregistreerde eigennaam wordt uitgevoerd (6114).If in step 6110, the recovery key character array is a proper name, then the glossary lookup section 6022 sends the proper name read from the reference words list 6020 (simply referred to as the proper name hereinafter) along with the data of the previous recovery key character array input from the glossary information retention table 6036a the processing section 6036 through the unit excision section 6016 to the glossary lookup section 6022, and to the proper name processing section 6024 where the processing of the proper name registered in the glossary is performed (6114).

Vervolgens zal een verklaring worden gegeven van de verwerking van een in de woordenlijst geregistreerde eigennaam met verwijzing naar het stroomschema van figuur 50.Next, an explanation will be given of the processing of a glossary-registered proper noun with reference to the flow chart of Figure 50.

35 De data, gezonden vanuit de woordenlijstopzoeksectie 6022 naar de eigennaam verwerkingssectie 6024 wordt ingevoerd vanaf de sectie 6026 voor verwerking van het voorafgaande zinseinde naar de sectie 6028 voor verwerking van de voorafgaande eigennaam. Bij de verwerking van de in de woordenlijst geregistreerde eigennaam speelt de sectie 6026 voor het 40 verwerken van het voorafgaande zinseinde geen rol.The data sent from the glossary lookup section 6022 to the proper name processing section 6024 is input from the preceding sentence end processing section 6026 to the preceding proper name processing section 6028. In the processing of the proper name registered in the glossary, section 6026 plays no role in processing the preceding sentence end.

8702358 70 « ë8702358 70 «

In de sectie 6028 voor verwerking van de voorafgaande eigennaam wordt beoordeeld of de terugwinsleutelkarakterarray, voorafgaand aan de eigennaam, een niet in de referentiewoordenlijst 6020 geregistreerde eigennaam is of niet, dat wil zeggen of het een eigennaam is die onder-5 worpen is aan de verwerking van een in de woordenlijst geregistreerde eigennaam (6200) of niet, zoals later nog zal worden beschreven. Indien een niet geregistreerde eigennaam betreft dan wordt de gehele eigennaam met de voorafgaande niet geregistreerde eigennaam beoordeeld als een eigennaam met de type-informatie van de eigennaam (6202), en de gege-10 vens worden toegezonden aan de verwerkingssectie 6036 en opgeslagen in de woorden!ijstinformatiebehoudtabel 6036a (6214).In the preceding proper name processing section 6028, it is judged whether or not the recovery key character array, prior to the proper name, is a proper name not registered in the reference dictionary 6020, i.e. whether it is a proper name that is subject to processing. of a proper name registered in the glossary (6200) or not, as will be described later. If an unregistered proper name is concerned, the entire proper name with the previous unregistered proper name is judged as a proper name with the proper name type information (6202), and the data is sent to the processing section 6036 and stored in the words List information retention table 6036a (6214).

In de sectie 6028 voor verwerking van de voorafgaande eigennaam wordt, indien de terugwinsleutelkarakterarray voorafgaand aan de eigennaam is beoordeeld als een niet geregistreerde eigennaam, in stap 6200 15 beoordeeld of de terugwinsleutelkarakterarray voorafgaand aan de eigen naam een in de referentiewoordenlijst 6020 geregistreerde eigennaam is of niet (6204). Als de terugwinsleutelkarakterarray voorafgaand aan de eigennaam een geregistreerde eigennaam is dan wordt beoordeeld of de type-informatie van de voorafgaande eigennaam onbekend is of niet, dat 20 wil zeggen of ze niet in de referentiewoordenlijst is geregistreerd of niet 6020 (6206).In the preceding proper name processing section 6028, if the recovery key character array prior to the proper name is judged as an unregistered proper name, it is judged in step 6200 15 whether or not the recovery key character array prior to the proper name is a proper name registered in the reference dictionary 6020 (6204). If the recovery key character array prior to the proper name is a registered proper name, then it is judged whether the type information of the previous proper name is unknown or not, ie whether it is not registered in the reference dictionary or not 6020 (6206).

Als de type-informatie van de voorafgaande eigen naam onbekend is dan gaat het stroomschema verder met stap 6202, waarin de gehele eigennaam met de voorafgaande eigennaam als een enkele eigennaam met de ty-25 pe-informatie van de eigennaam (6202) wordt beschouwd en de sectie 6028 voor verwerking van de voorafgaande eigennaam zendt de gegevens naar de verwerkingssectie 6036. De naar de verwerkingssectie 6036 gezonden gegevens worden opgeslagen in de woorden!ijstinformatiebehoudtabel 6036a (6214).If the previous proper name type information is unknown, the flowchart proceeds to step 6202, wherein the entire proper name with the previous proper name is considered a single proper name with the proper name type information (6202), and the preceding proper name processing section 6028 sends the data to the processing section 6036. The data sent to the processing section 6036 is stored in the words list information retention table 6036a (6214).

30 In de sectie 6028 voor verwerking van de voorafgaande eigennaam worden, indien de type-informatie van de voorafgaande eigennaam beoordeeld is als onbekend, dat wil zeggen indien deze beoordeeld is als geregistreerd in de referentiewoordenlijst 6020, de gegevens overgedragen vanaf de sectie 6028 voor verwerking van de voorafgaande enkele eigen-35 naam naar de sectie 6030 voor verwerking van de eigennaam. In de eigennaam verwerkingssectie 6030 wordt beoordeeld of de type-informatie van de eigennaam onbekend is of niet (6208). In het geval de type-informatie voor de eigennaam onbekend is beoordeelt de eigennaam verwerkingssectie 60330 het gehele gedeelte van de eigennaam met de voorafgaande 40 eigennaam als een eigennaam met de type-informatie van de voorafgaande 8702359 71In the preceding proper name processing section 6028, if the previous proper name type information is judged as unknown, that is, if it is judged as registered in the reference dictionary 6020, the data is transferred from the processing section 6028 from the previous single proper name to section 6030 for proper name processing. In the proper name processing section 6030, it is judged whether or not the proper name type information is unknown (6208). In case the type information for the proper name is unknown, the proper name processing section 60330 judges the entire portion of the proper name with the previous 40 proper name as a proper name with the type information of the previous 8702359 71

A AA A

r > eigen naam (6210) en zendt de gegevens naar de eigennaam verwerkingssectie 6036. De aan de verwerkingssectie 6036 toegezonden gegevens worden geregistreerd in de woordenlijstinformatiebehoudtabel 6036a (6214).r> own name (6210) and sends the data to the proper name processing section 6036. The data sent to the processing section 6036 is recorded in the glossary information retention table 6036a (6214).

5 In de eigennaam verwerkingssectie 6030 wordt, indien wordt geoordeeld dat de type-informatie van de eigennaam niet onbekend is, dat wi! zeggen indien ze geregistreerd is in de referentiewoorden!ijst 6020, de eigennaam beoordeeld als een eigennaam met de type-informatie die afkomstig is uit de referentiewoordenlijst 6020, onafhankelijk van de 10 voorafgaande eigennaam (6212) en de gegevens worden toegezonden aan de verwerkingssectie 6036. De aan de verwerkingssectie 6036 toegezonden gegevens worden geregistreerd in de woordenlijstinformatiebehoudtabel 6036a (6214).5 In the proper name processing section 6030, if it is judged that the proper name type information is not unknown, that wi! say if it is registered in the reference words! list 6020, the proper name judged as a proper name with the type information originating from the reference dictionary 6020, independent of the previous proper name (6212) and the data is sent to the processing section 6036. The data transmitted to the processing section 6036 is recorded in the glossary information retention table 6036a (6214).

Opnieuw verwijzend naar figuur 49 wordt, indien er geen terugwin-15 sleutelkarakterarray als ingangsgegeven in de referentiewoordenlijst 6020 aanwezig is in stap 6108, beoordeeld of het eerste karakter in de terugwinsleutelkarakterarray een hoofdletter is of niet (6116) en indien het geen hoofdletter is dan beoordeelt de woordenlijstopzoeksectie 6022 de terugwinsleutelkarakterarray als een niet geregistreerd woord, 20 zendt dit naar de verwerkingssectie 6036 en bergt het op in de woorden-1ijstinformatiebehoudtabel 6036a (6118).Referring again to Figure 49, if there is no recovery 15 key character array as input in the reference glossary 6020 in step 6108, it is judged whether the first character in the recovery key array is capitalized or not (6116) and judged capitalized. the glossary lookup section 6022 recovers the key character array as an unregistered word, sends it to the processing section 6036 and stores it in the glossary information retention table 6036a (6118).

Indien het eerste karakter een hoofdletter is dan worden de gegevens voor de terugwinsleutelkarakterarray tezamen met de gegevens van de voorafgaande terugwinsleutelkarakterarray vanuit de woordenlijstop-25 zoeksectie 6022 toegezonden aan de eigennaam verwerkingssectie 6024 waarin de verwerking voor een niet in de woordenlijst geregistreerde eigennaam wordt uitgevoerd (6120).If the first character is uppercase, the data for the recovery key character array together with the data from the previous recovery key character array is sent from the glossary search section 6022 to the proper name processing section 6024 in which processing for a proper name not registered in the glossary is performed (6120 ).

Een verklaring van de verwerking van een niet in de woordenlijst geregistreerde eigennaam zal nu worden gegeven met verwijzing naar fi-30 guur 51.An explanation of the processing of a non-glossary proper name will now be made with reference to Figure 51.

De gegevens voor de terugwinsleutel karakterarray worden tezamen met de gegevens van het voorafgaande ingangsgegeven geregistreerd in de woordenlijstinformatiebehoudtabel toegezonden aan de sectie 6026 voor de verwerking van het voorafgaande zinseinde indien geoordeeld wordt 35 dat het einde van het voorafgaande ingangsgegeven in de woordenlijstinformatiebehoudtabel een kandidaat is voor het zinsuiteinde of niet (6300). Deze beoordeling op het kandidaat zijn voor het einde van een zin wordt gedaan door te beoordelen of het einde van het voorafgaande ingangsgegeven de woordenlijstinformatiebehoudtabel een kandidaat is 40 voor het einde van de zin, zoals een afzonderlijke punt (.), enz., of 8702358 72 & i A '4· niet.The data for the recovery key character array, together with the data of the preceding entry data recorded in the glossary information retention table, is sent to the section 6026 for processing the preceding sentence end if it is judged that the end of the preceding input data in the glossary information retention table is a candidate for the sentence ending or not (6300). This end-of-sentence candidate is made by judging whether the end of the preceding entry entry the glossary information retention table is an end-of-sentence candidate 40, such as a separate period (.), Etc., or 8702358 72 & i A '4 not.

Indien het einde van het voorafgaande.ingangsgegeven, geregistreerd in de woordenlijstinformatiebehoudtabel een kandidaat is voor het einde van de zin, dan worden de gegevens vanuit de sectie 6026 voor 5 verwerking van het uiteinde van de voorafgaande zin toegezonden aan de sectie 6028 voor verwerking van de voorafgaande eigennaam en de sectie 6028 beoordeelt het voorafgaande in de woordenlijstinformatiebehoudta-bel geregistreerde gegeven als het einde van de zin (6302), en zendt dit aan de woorden!ijstopzoeksectie 6022 na veranderen van de hoofdlet-10 ter aan het begin van de terugwinsleutelkarakterarray in een kleine letter.If the end of the preceding entry data recorded in the glossary information retention table is a candidate for the end of the sentence, the data from the end sentence of the preceding sentence processing section 6026 is sent to the section 6028 for processing the end sentence. preceding proper name and section 6028 judges the preceding entry recorded in the glossary information retention table as the end of the sentence (6302), and transmits it to the words list lookup section 6022 after changing the capital letter 10b at the beginning of the recovery key array in a lowercase letter.

De woorden!ijstopzoeksectie 6022 zoekt in de referentiewoorden-1 ijst 6020 naar de terugwinsleutelkarakterarray omgevormd met een kleine letter (6304) en beoordeelt of er een ingangsgegeven in de referen-15 tiewoordenlijst 6020 aanwezig is (6306). Indien er een ingangsgegeven wordt gevonden dan zendt de woorden!ijstopzoeksectie 6022 de uit de re-ferentiewoordenlijst 6020 afkomstige gegevens naar de verwerkingssectie 6036 en bergt deze op in de woordenlijst!nformatiebehoudtabel 6036a (6308). Indien er geen ingangsgegeven aanwezig is dan vormt de woorden-20 1ijstopzoeksectie 6022 het eerste karakter van de terugwinsleutelkarakterarray weer om in een hoofdletter en zendt het woord als een niet geregistreerde eigennaam naar de verwerkingssectie 6036 en bergt het op in de woordenlijst!nformatiebehoudtabel 6036a (6310).The words list lookup section 6022 searches the reference words list 6020 for the recovery key character array transformed with a lowercase letter (6304) and judges whether there is an entry in the reference word list 6020 (6306). If an entry is found, then the words list lookup section 6022 sends the data from the reference glossary 6020 to the processing section 6036 and stores it in the glossary information retention table 6036a (6308). If no entry is present, then the words list lookup section 6022 capitalizes the first character of the recovery key character array and sends the word as an unregistered proper name to the processing section 6036 and stores it in the glossary information retention table 6036a (6310). ).

Als in stap 6300 de sectie 6026 voor verwerking van het vooraf-25 gaande zinsuiteinde oordeelt dat het einde van het voorafgaande ingangsgegeven in de woordenlijst!nformatiebehoudtabel geen kandidaat is voor het einde van de zin dan worden de gegevens vanaf de sectie 6026 toegevoerd aan de sectie 6028 voor verwerking van de voorafgaande eigen naam en de sectie 6028 beoordeelt dat het voorafgaande ingangsgegeven, 30 geregistreerd in de woorden!ijstinformatiebehoudtabel niet het einde van de zin is (6312). De gegevens worden vanaf de sectie 6028 voor verwerking van de voorafgaande eigennaam toegezonden aan de eigennaam verwerkingssectie 6030 en de sectie 6030 beoordeelt de terugwinsleutelkarakterarray als een eigennaam waarvan de type-informatie onbekend is 35 (6314).In step 6300, if the preceding sentence end processing section 6026 judges that the end of the previous entry in the glossary information retention table is not a candidate for the end of the sentence, the data from section 6026 is fed to the section 6028 for processing the previous own name and section 6028 judges that the previous entry data registered in the words list information retention table is not the end of the sentence (6312). The data is sent from the section 6028 for processing the previous proper name to the proper name processing section 6030, and the section 6030 judges the recovery key character array as a proper name whose type information is unknown 35 (6314).

De eigennaam verwerkingssectie 6030 brengt de gegevens terug naar de sectie 6028 voor verwerking van de voorafgaande eigennaam en de verwerking van de in de woordenlijst geregistreerde eigennaam wordt uitgevoerd in de verwerkingssectie 6028 (6316). De verwerking van de in de 40 woordenlijst geregistreerde eigennaam is gelijk aan die getoond in fi- 8702359 73 * Λ f ΐ guur 50.The proper name processing section 6030 returns the data to the section 6028 for processing the previous proper name and the processing of the proper name registered in the dictionary is performed in the processing section 6028 (6316). The processing of the proper name registered in the 40 word list is the same as that shown in figure 8702359 73 * Λ figure 50.

Als, terwugkerend naar figuur 49, de woordenlijstverwijzingseen-heiduitsnijding in stap 6104 is beëindigd dan worden de gegevens, geregistreerd in de woorden!ijstinformatiebehoudtabel 6036a, afgegeven van-5 af de verwerkingssectie 6036 aan de structuuranalysesectie 6038 (6122) waarmee de morfologische analyse in deze uitvoeringsvorm is voltooid.Referring back to Fig. 49, when the glossary reference unit excision is terminated in step 6104, the data recorded in the words ice information retention table 6036a is output from -5 from the processing section 6036 to the structure analysis section 6038 (6122) that performs the morphological analysis in this document. embodiment is completed.

De werking van de onderhavige inrichting, die in het bovenstaande is beschreven, zal nu worden verklaard met verwijzing naar een als voorbeeld dienst doende ingangszin.The operation of the present device described above will now be explained with reference to an exemplary input sentence.

10 De verklaring zal worden gegeven met verwijzing naar figuur 52 waarin bijvoorbeeld een ingangszin "In Tokyo Station Mr. Walter —" wordt ingevoerd. Allereerst wordt de invoerverwerking 6100 voor het inlezen van de ingangszin in de invoerverwerkingssectie 6014 uitgevoerd. Daarna worden de woordenlijstuitsnijdingseenheden uitgesneden (6102) 15 door de ingangszin te verdelen in de respectievelijke woorden door middel van de spaties. Allereerst wordt in de referentiewoordenlijst 6020 gezocht op "In" (6106). Er is geen ingangsgegeven voor "In" in de referent iewoordenli jst 6020. Als verder wordt gegaan met verwerking van niet in de woordenlijst geregistreerde eigennamen dan wordt, omdat het 20 voorafgaande gedeelte herkend wordt als het begin van de zin (het begin van het bestand) "In" omgevormd tot "in". Omdat "in" een ingangsgegeven heeft in de referentiewoordenlijst 6020 en omdat het geen eigennaam is "6110) worden de uit de referentiewoordenlijst afkomstige gegevens 6020 geregistreerd in de woordenlijstinformatiebehoudtabel 6036a (6112).The explanation will be given with reference to Figure 52 in which, for example, an input phrase "In Tokyo Station Mr. Walter -" is entered. First, the input processing 6100 for reading the input sentence into the input processing section 6014 is performed. Then, the glossary excision units are cut (6102) by dividing the input sentence into the respective words by the spaces. First, the reference dictionary 6020 searches for "In" (6106). There is no input for "In" in the reference glossary 6020. If processing of nouns not registered in the glossary is continued, the previous part is recognized as the beginning of the sentence (the beginning of the file ) "In" transformed into "in". Because "in" has an entry in the reference dictionary 6020 and because it is not a proper name "6110), the data 6020 from the reference dictionary is recorded in the glossary information retention table 6036a (6112).

25 Vervolgens wordt de referentiewoordenlijst 6020 ondervraagd op "Tokyo" (6106). Omdat er geen ingangsgegeven voor "Tokyo" in de referentiewoordenlijst 6020 aanwezig is (6108) en omdat het eerste karakter een hoofdletter is (6116) wordt de verwerking voor een niet in de woordenlijst geregistreerde eigennaam uitgevoerd (6120). Als volgende stap 30 wordt verder gegaan met figuur 51. Omdat het voorafgaande gedeelte gelijk is aan "In" en geen kandidaat is voor het einde van de zin (6300), wordt "In" herkend als niet behorend tot het einde van de zin (6312), "Tokyo" wordt herkend als een eigennaam waarvan de type-informatie onbekend is (6314) en de verwerking voor een in de woordenlijst geregis-35 treerde eigennaam wordt uitgevoerd (6316). Als volgende stap wordt verder gegaan met figuur 50. Omdat het voorafgaande "In" geen geregistreerde eigennaam is (6200) en ook geen niet geregistreerde eigennaam (6204) wordt "Tokyo" op zichzelf als eigennaam geregistreerd met een eigen type-informatie, dat wil zeggen een eigennaam waarvan de type-in-40 formatie onbekend is (6216).Next, the reference glossary 6020 is queried on "Tokyo" (6106). Since there is no entry for "Tokyo" in the reference dictionary 6020 (6108) and because the first character is uppercase (6116), processing for a non-glossary proper name is performed (6120). As a next step 30, continue with Figure 51. Since the preceding part is equal to "In" and is not a candidate for the end of the sentence (6300), "In" is recognized as not belonging to the end of the sentence ( 6312), "Tokyo" is recognized as a proper name whose type information is unknown (6314) and processing for a glossary registered proper name is performed (6316). The next step is continued with figure 50. Since the preceding "In" is not a registered proper name (6200) nor an unregistered proper name (6204), "Tokyo" is registered in itself as a proper name with its own type information, ie say a proper noun of which the type-in-40 formation is unknown (6216).

870235S870235S

74 •T ,1 Λ $74 • T, 1 Λ $

Als volgende stap wordt verder gegaan met figuur 49 en in de referent iewoordenl ijst 6200 wordt gezocht naar "Station" (6106). Omdat er een ingangsgegeven voor "Station" in de referentiewoordenlijst 20 (6108) aanwezig is en dit een eigennaam is (6110) wordt nu de verwer-5 king voor een in de woordenlijst geregistreerde eigennaam uitgevoerd (6114). Vervolgens wordt verder gegaan met figuur 50. Omdat het voorafgaande “Tokyo" niet als eigennaam is geregistreerd (6200) wordt het gehele gedeelte “Tokyo Station" geregistreerd als eigennaam met als type-informatie "plaats" behorend bij "Station" (6202).The next step is continued with figure 49 and in the reference word list 6200 a search is made for "Station" (6106). Since there is an entry for "Station" in the reference dictionary 20 (6108) and this is a proper name (6110), processing for a proper name registered in the dictionary is now performed (6114). Next, proceed to Figure 50. Since the previous "Tokyo" is not registered as a proper name (6200), the entire portion "Tokyo Station" is registered as a proper name with as type information "place" associated with "Station" (6202).

10 Daarna wordt in de referentiewoordenlijst 6200 in figuur 49 gezocht naar "Mr." (6016). Omdat er een ingangsgegeven voor "Mr." in de referentiewoordenlijst 6020 aanwezig is en omdat dit een eigennaam is (6110) wordt de verwerking voor een in de woordenlijst geregistreerde eigennaam uitgevoerd (6114). Daarna wordt verder gegaan met figuur 50.Then, in the reference dictionary 6200 in Figure 49, a search is made for "Mr." (6016). Because there is an entry for "Mr." is present in the reference glossary 6020 and because it is a proper name (6110), processing for a proper name registered in the glossary is performed (6114). Then continue with figure 50.

15 Het voorafgaande "Station" is niet een niet geregistreerde eigennaam (6200) maar een geregistreerde eigennaam (6204) en de type-informatie "plaats" is niet onbekend (6206). Omdat "Mr." als type-informatie "persoon" heeft hetgeen niet onbekend is (6208) wordt "Mr." op zichzelf afzonderlijk geregistreerd als een eigennaam met als type-informatie 20 "persoon" (6212).The preceding "Station" is not an unregistered proper name (6200) but a registered proper name (6204) and the type information "place" is not unknown (6206). Because "Mr." if type information has "person" which is not unknown (6208), "Mr." individually registered as a proper name with "person" as type information (6212).

Terugkerend opnieuw naar figuur 49 wordt de referentiewoordenlijst 6020 onderzocht op "Walter" (6016). Omdat "Walter" een ingangsgegeven heeft in de referentiewoordenlijst 6020 (6108) en omdat het een eigennaam is (6110) wordt de verwerking voor de in de woordenlijst geregis-25 treerde eigennaam uitgevoerd (6114). Vervolgens wordt verder gegaan met figuur 50. Omdat het voorafgaande "Mr." niet een niet geregistreerde eigennaam is (6200) maar een geregistreerde eigennaam (6204) waarvan de type-informatie "persoon" niet onbekend is (6202), terwijl de type-in-formatie van "Walter" wel onbekend is (6208), wordt "Mr. Walter" samen-30 gevoegd en geregistreerd als eigennaam met als type-informatie "persoon" (6210).Returning again to Figure 49, the reference glossary 6020 is examined for "Walter" (6016). Because "Walter" has an entry in the reference dictionary 6020 (6108) and because it is a proper name (6110), processing for the proper name registered in the dictionary is performed (6114). Then continue with figure 50. Because the previous "Mr." is not an unregistered proper name (6200) but a registered proper name (6204) whose type information "person" is not unknown (6202), while "Walter" type information is unknown (6208), "Mr. Walter" merged-30 and registered as a proper name with as type information "person" (6210).

Zoals al in het bovenstaande is beschreven wordt in de onderhavige uitvindingsvorm de Engelse ingangszin verdeeld in terugwinsleutelkarak-terarray's waarmee allereerst de referentiewoordenlijst 6020 wordt af-35 gevraagd. Als er een ingangsgegeven aanwezig is als eigennaam in de referentiewoordenlijst 6020 dan wordt de verwerking voor een geregistreerde eigennaam uitgevoerd. De verwerking voor een geregistreerde eigennaam wordt uitgevoerd terwijl rekening wordt gehouden met het voorafgaande ingangsgegeven, geregistreerd in de woorden!ijstinforma-40 tiebehoudtabel. Indien het voorafgaande ingangsgegeven, geregistreerd 87 0 2 359 75 < Α·- r τ in de woorden!ijstinformatiebehoudtabel een eigennaam is, dan wordt de type-informatie gecontroleerd en wordt de lopende eigennaam verwerkt. Als bij een van beide de type-informatie ontbreekt dan wordt de type-informatie van de ander eraan toegevoegd, terwijl indien beiden voor-5 zien zijn van type-informatie, ze herkend worden als individuele eigennamen met respectievelijke inherente type-informaties.As already described above, in the present invention, the English input sentence is divided into recovery key character arrays, which first query the reference glossary 6020. If an entry is present as a proper name in the reference dictionary 6020, processing for a registered proper name is performed. The processing for a registered proper name is carried out while taking into account the previous input data, registered in the words list information retention table. If the previous entry, registered 87 0 2 359 75 <Α · - r τ in the words! List information retention table, is a proper name, then the type information is checked and the current proper name is processed. If either type information is missing, then the other type information is added, while if both are ahead of type information, they are recognized as individual proper names with respective inherent type information.

Het is derhalve mogelijk om een eigennaam zonder type-informatie te voorzien van geschikte type-informatie en ook de wel aanwezige type-informatie op een meer correcte wijze te beperken. Dit maakt een veel 10 effectievere analyse in de navolgende structuuranalyse mogelijk en ook een adequate vertaling mogelijk.It is therefore possible to provide a proper name without type information with suitable type information and also to limit the type information that is present in a more correct manner. This allows for a much more effective analysis in the following structural analysis and also an adequate translation.

Als verder van de niet in de referentiewoordenlijst 6020 geregistreerde karakterarray het eerste karakter een hoofdletter is en als de voorafgaande karakterarray is beoordeeld als behorend bij het einde van 15 de zin dan wordt opnieuw in de referentiewoordenlijst 6020 gezocht na wijziging van de hoofdletter in een kleine letter en het is derhalve mogelijk om de referentiewoordenlijst 6020 ook af te vragen voor een karakterarray aan het begin van de zin. Als verder een karakterarray die begint met een hoofdletter verschijnt in een gedeelte dat niet be-20 hoort bij het begin van de zin dan wordt dit beoordeeld als een eigennaam en wordt deze eigennaam voorzien van de type-informatie behorend bij een eigennaam die eraan vooraf gaat of erop volgt. Op deze wijze kan tot op zekere hoogte een niet in de referentiewoordenlijst 6020 geregistreerde eigennaam worden ontleed.If further from the character array not registered in the reference dictionary 6020, the first character is uppercase and if the preceding character array is judged as belonging to the end of the sentence, the reference dictionary 6020 is searched again after changing the uppercase letter to lowercase and it is therefore possible to query the reference glossary 6020 also for a character array at the beginning of the sentence. Furthermore, if a character array starting with a capital letter appears in a part that does not belong to the beginning of the sentence, it will be judged as a proper name and this proper name will be provided with the type information associated with a proper noun that precedes it. or follows it. In this way, a proper name not registered in the reference glossary 6020 can be decomposed to a certain extent.

25 Verwezen wordt vervolgens naar de zevende uitvoeringsvorm van de onderhavige uitvinding.Reference is then made to the seventh embodiment of the present invention.

Figuur 54 illustreert de gehele structuur van de zevende uitvoeringsvorm van de taal analyse inrichting volgens de onderhavige uitvinding, toegepast bij een inrichting voor het automatisch vertalen van 30 Engels naar Japans.Figure 54 illustrates the entire structure of the seventh embodiment of the language analyzer according to the present invention used in an automatic translation device from English to Japanese.

Deze uitvoeringsvorm heeft een invoersectie 7010 door middel waarvan een Engelse tekst 7012 die in het Japans moet worden vertaald, wordt ingevoerd. De invoersectie 7010 kan bijvoorbeeld voorzien zijn van een toetsenbord met karaktertoetsen zoals alfanumerieke toetsen of 35 functietoetsen, een optische karakterlezer (OCR-lezer) die een Engelse tekst, geregistreerd op papier, leest en/of een bestandsgeheugeninrich-ting voor het lezen van een Engelse tekst die in een geheugenmedium is vastgelegd, zoals bijvoorbeeld op een magnetische schijf.This embodiment has an input section 7010 through which an English text 7012 to be translated into Japanese is input. Input section 7010 may include, for example, a keyboard with character keys such as alphanumeric keys or 35 function keys, an optical character reader (OCR reader) that reads an English text, recorded on paper, and / or a file memory device for reading an English text that is recorded in a memory medium, such as on a magnetic disk.

De Engelse tekst, ingevoerd via de invoersectie 7010 wordt ingeie-40 zen in een voorredigeersectie 7014 waarin de voorbehandeling voor de 6702359 76 £ <f i % vertaling wordt uitgevoerd. In dit geval worden in hoofdzaak zinsher-kenning en onbekende woordbehandeling uitgevoerd. Deze functies vormen een deel van de morfologische analyse.The English text, entered through input section 7010, is entered into a pre-editing section 7014 in which the pre-treatment for the 6702359 76% translation is performed. In this case, mainly sentence recognition and unknown word processing are performed. These functions are part of the morphological analysis.

De voorgeredigeerde Engelse gegevens worden tezamen met de in de 5 voorredigering verkregen informatie overgebracht naar een morfologische analysesectie 7016. De sectie 7016 analyseert de morfemen van de Engelse zin terwijl deze wordt verdeeld door referentie aan oordenlijst 7018, voert diverse type indelingen uit zoals bewerkingen van een onbekend woord, uitdrukkingen voor eigennamen, tijd, cijfers, enz. en voert 10 bewerkingen uit op de gehele zin zoals het zoeken naar vaste uitdrukkingen en gezegdes. De morfologische analyseregels zijn opgeslagen in het analyseregelbestand 7036.The pre-edited English data, along with the information obtained in the pre-editing, is transferred to a morphological analysis section 7016. Section 7016 analyzes the morphemes of the English sentence while it is distributed by reference to dictionary 7018, performs various types of formats such as operations of a unknown word, proper noun phrases, time, numbers, etc. and performs 10 operations on the entire sentence such as searching for fixed expressions and sayings. The morphological analysis rules are stored in the analysis rule file 7036.

De aan de morfologische analyse onderworpen Engelse gegevens worden tezamen met de woordenlijstinformatie, verkregen uit de analyse, 15 overgedragen naar een structuuranalysesectie I 7020. De structuuranaly-sesectie I 7020 is een functionele sectie waarmee de oppervlaktelaag-structuur van de zin wordt geanaliseerd terwijl grammaticale regels worden toegepast op de Engelse gegevens teneinde alle structurele mogelijkheden op te sporen.The English data subjected to the morphological analysis, together with the glossary information obtained from the analysis, are transferred to a structure analysis section I 7020. The structure analysis section I 7020 is a functional section that analyzes the surface layer structure of the sentence while grammatical rules are applied to the UK data in order to detect all structural possibilities.

20 De aan de analyse in de structuuranalysesectie I 7020 onderworpen Engelse gegevens worden tezamen met de analyse-informatie toegezonden naar een structuuranalysesectie II 7022 waarin een oplossing wordt geselecteerd uit het resultaat van de analyse van de oppervlakte!aag-structuur in de structuuranalysesectie I door toepassing van een syntax 25 analyse. Een plausibele analysestructuur van de Engelse zin wordt op deze wijze voorbereid en de structuur wordt vervaardigd. De analysere-gels zijn ook opgeslagen in het analyseregelbestand 7036.The English data subject to the analysis in the structure analysis section I 7020 are sent together with the analysis information to a structure analysis section II 7022 in which a solution is selected from the result of the analysis of the surface layer structure in the structure analysis section I by application of a syntax 25 analysis. A plausible analysis structure of the English sentence is prepared in this way and the structure is manufactured. The analysis rules are also stored in the analysis rule file 7036.

Na onderwerping aan de analyse worden de Engelse gegevens overgedragen als gegevens van de analysestructuur aan een structuurtransfor-30 matiesectie 7024. De structuurtransformatiesectie 7024 bereidt een corresponderende Japanse structuur voor uit de structuur die een tussenstructuur vormt met de Engelse structuur en transformeert deze in een onderliggende Japanse structuur van waaruit met gemak een Japanse zin kan worden vertaald.After submission to the analysis, the English data is transferred as data from the analysis structure to a structure transformation section 7024. The structure transformation section 7024 prepares a corresponding Japanese structure from the structure that forms an intermediate structure with the English structure and transforms it into an underlying Japanese structure. structure from which a Japanese sentence can be easily translated.

35 De analysestructuurgegevens die de op deze wijze getransformeerde35 The analytical structure data that the transformed in this way

Japanse onderliggende structuur tonen worden overgedragen naar een vertaling genererende sectie 7026 waarin de vertaalde zin wordt gegenereerd. Dit is een functionele sectie waarin de Japanse zin wordt gegenereerd uitgaande van de Japanse analysestructuur.Japanese underlying structure tones are transferred to a translation generating section 7026 in which the translated sentence is generated. This is a functional section in which the Japanese sentence is generated from the Japanese analysis structure.

40 De gegevens voor de in het Japans vertaalde zin, dat wil zeggen de 8702359 77 A » f t vertaalde zinsgegevens worden toegezonden aan een naredigeersectie 7030. De naredigeersectie 7030 modificeert de vertaalde zinsgegevens met verwijzing naar de woordenlijst 7018 gebruikmakend van de informatie die toegepast is in het vertaalproces teneinde een meer natuurlijke 5 Japanse zin te vormen. De gegevens voor de Japanse zin worden overgedragen aan een uitgangssectie 7032 en vandaar afgegeven als de vertaalde Japanse zin 7034. De uitgangssectie kan bijvoorbeeld voorzien zijn van een afdrukeenheid, een weergeefeenheid en/of een bestandsgeheugen-inrichting zoals een magnetisch schijfgeheugen.40 The data for the sentence translated into Japanese, that is, the 8702359 77At translated sentence data is sent to a post-digest section 7030. The post-digest section 7030 modifies the translated phrase data with reference to the glossary 7018 using the information used in the translation process in order to form a more natural 5 Japanese sentence. The Japanese sentence data is transferred to an output section 7032 and hence output as the translated Japanese sentence 7034. The output section may include, for example, a printing unit, a display unit and / or a file memory device such as a magnetic disk memory.

10 De reeks van vertaal stappen wordt bestuurd door een stuursectie 7038 waarmee de besturing over de gehele inrichting wordt uitgevoerd.The sequence of translation steps is controlled by a control section 7038 with which control is performed over the entire device.

In de woordenlijst 7018 zijn de woordenlijstgegevens van de Engelse en de Japanse woorden in deze uitvoeringsvorm opgeslagen. Opgeslagen zijn daarin de vocabulaires, evenals de onderlinge samenhang, dat wil 15 zeggen verbindende relaties en diverse informaties zoals betekenis, enkelvoudige of meervoudige vorm, zinsdeel, enz. Verder zijn regelgege-vens voor de morfologische analyse en de syntax analyse opgeslagen in het analyseregelbestand 7036.The glossary 7018 stores the glossary data of the English and Japanese words in this embodiment. Stored therein are the vocabularies, as well as the interrelationships, ie connecting relationships and various information such as meaning, singular or multiple form, phrase, etc. Furthermore, control data for the morphological analysis and the syntax analysis are stored in the analysis rule file 7036 .

De stuursectie 7038 is verbonden met de operatieweergeefsectie 20 7040. De operatieweergeefsectie 7040 bevat operatietoetsen die diverse instructies mogelijk maken vanaf een operateur aan de onderhavige inrichting, bijvoorbeeld een vertalinginstructietoets en een cursortoets, en een weergeefeenheid of een beeldscherm waarmee de ingevoerde Engelse zinstekst, de Japanse tekst als resultaat van de vertaling en eventuele 25 tussenliggende gegevens zoals woordenlijstinformatie en diverse instructies voor de operateur zichtbaar gemaakt kunnen worden.The control section 7038 is connected to the operation display section 7040. The operation display section 7040 includes operation keys that enable various instructions from an operator at the present device, for example, a translation instruction key and a cursor key, and a display unit or a display that displays the entered English sentence text, the Japanese text as a result of the translation and any intermediate data such as glossary information and various instructions can be made visible to the operator.

De inrichting kan zodanig worden uitgevoerd dat de meeste opera-tieweergeeffuncties opgenomen kunnen worden in het toetsenbord indien dit aangebracht is aan de ingangssectie 7010 of in een weergeefeenheid 30 indien deze zich bevindt bij de uitgangssectie 7032.The device can be configured so that most operation display functions can be included in the keyboard if it is mounted on the input section 7010 or in a display unit 30 if it is located on the output section 7032.

In figuur 53 is een gedetailleerde structuur van de morfologische analysesectie 7016 getoond in het bijzonder voor de verwerking van cijfers. Die delen die direct van belang zijn voor een goed begrip van de onderhavige uitvinding zijn getoond, maar de analysesectie 7016 heeft 35 natuurlijk nog andere functionele analysesecties.In Figure 53, a detailed structure of the morphological analysis section 7016 is shown in particular for digit processing. Those parts that are directly relevant to an understanding of the present invention have been shown, but the analysis section 7016 has, of course, other functional analysis sections.

De sectie 7016 heeft een ingangsverwerkingssectie 7100 voor het ontvangen en verwerken van de ingangskarakterarraygegevens ingevoerd vanaf de voorredigeersectie 7014. De invoerverwerkingssectie 7100 is voorzien van een ingangskarakterarraybuffer waarin de Engelse karakter-40 array wordt ingevoerd in codevorm bijvoorbeeld als ASCII-codegegevens 8702359 «; * 78 waarmee de karakterarraygegevens tijdelijk worden geaccumuleerd.The section 7016 has an input processing section 7100 for receiving and processing the input character array data input from the precursor section 7014. The input processing section 7100 includes an input character array buffer in which the English character-40 array is input in code form, for example, as ASCII code data 8702359; * 78 that temporarily accumulates the character array data.

De ingangskarakterarray gegevens die tijdelijk in de ingangsverwer-kingssectie 7100 zijn geaccumuleerd worden toegevoerd aan een eenheiduitsnijdingssectie 7102 waarmee de gegevens worden verdeeld in woorden-5 lijstverwijzingseenheden zoals woorden. De eenheidsuitsnijdingssectie 7102 is een functionele sectie waarmee woordenlijstverwijzingseenheden die deel uitmaken van de karakterarray worden onderscheiden, welke eenheden worden gebruikt voor het aanspreken van de woordenlijst 7018 door de daarop volgende woordenlijstonderzoeksectie 7106. Woordenlijstrefe-10 rentiebegrenzers, gebruikt bij het uitsnijdingsproces van de woorden-lijstverwijzingseenheid, worden geplaatst op de positie van een Engels karakter, een numeriek karakter, een apostrof, karakters anders dan een koppelteken en een rustteken, alsmede bij een apostrof die volgt op een leeg karakter. Ze worden opgeslagen in de begrenzerstabel 7104 en ge-15 bruikt tijdens het uitsnijden van de woordenlijstverwijzingseenheid in de eenheiduitsnijdingssectie 7102.The input character array data temporarily accumulated in the input processing section 7100 is fed to a unit excision section 7102 which divides the data into word list units such as words. The unit excision section 7102 is a functional section that distinguishes vocabulary reference units that are part of the character array, which units are used to access the vocabulary 7018 by the subsequent vocabulary search section 7106. Glossary reference delimiters used in the vocabulary reference unit cut-out process. are placed at the position of an English character, a numeric character, an apostrophe, characters other than a hyphen and a rest character, as well as an apostrophe following an empty character. They are stored in the limiter table 7104 and used during the cutout of the glossary reference unit in the unit cutout section 7102.

De woordenlijst 7018 bevat in het bijzonder informatie voor het opzoeken van een uitgesneden eenheid. Verder bevat de woordenlijst 7018 morfologische analyse-informatie zoals de namen van maanden, kalender-20 dagen, grondtallen die alleen numerieke betekenis hebben, gewone getallen, eenheden voor het uitdrukken van grammen en dergelijke, tijd, the, of, komma's (,), punten (.), enz.The glossary 7018 specifically includes information for looking up a cut unit. Furthermore, the glossary 7018 contains morphological analysis information such as the names of months, calendar-20 days, base numbers that only have numerical meaning, ordinary numbers, units for expressing grams and the like, time, the or commas (,), points (.), etc.

De woorden!ijstopzoeksectie 7106 is een functionele sectie waarmee de woordenlijst 7018 wordt aangesproken om daaruit woorden!ijstinforma-25 tie te halen gebaseerd op de karakterarray die ingevoerd is vanuit de eenheiduitsnijdingssectie 7102 welke informatie wordt overgedragen naar de sectie 7108 voor het verschaffen van de morfologische analyse-informatie.The word list lookup section 7106 is a functional section that accesses the word list 7018 to extract word list information based on the character array input from the unit excision section 7102 which is transferred to the section 7108 to provide the morphological information. analysis information.

De sectie 7108 voor het verschaffen van de morfologische analyse-30 informatie bevat informatie (zie ook figuur 56) die aangeeft dat een opeenvolging van karakters met morfeemeigenschappen een tijdsbetekenis heeft zoals uur, jaar, maand, enz. terwijl verder specifieke informatie wordt verschaft aan de karakterarray waarvan herkend wordt dat deze een grondtal of tijdsbetekenis bevat in de woorden!ijstopzoeksectie 7016.The section 7108 for providing the morphological analysis information contains information (see also Figure 56) indicating that a sequence of characters with morphological properties has a time meaning such as hour, year, month, etc. while further providing specific information to the character array recognized as containing a base or time meaning in the words! ice-pop-up section 7016.

35 Bij "jaar" wordt bijvoorbeeld de informatie "numeriek getal" verschaft.35 For example, "year" provides the information "numeric number".

De karakterarray, voorzien van de analyse-informatie in de informatie verschaffende sectie 7108 wordt verder onderworpen aan de benodigde lokale analyse.The character array, provided with the analysis information in the information providing section 7108, is further subjected to the required local analysis.

40 In dit geval wordt een eenheidsgroep van de woorden!ijstverwij- 8702359 Λ I· I Ϋ 79 zingseenheid, zoals een woord, geactueerd door de morfeemactuatie-in-formatie tezamen gevoegd tot een eenheid door gebruik te maken van de lokale analyseregels. Bijvoorbeeld "naam van de maand", "numerieke uitdrukking" worden tezamen gevoegd tot "naam van de maand + numerieke 5 uitdrukking", d.w.z. "Oct." en "18" worden gegroepeerd tot "Oct.18". Daarnaast worden collectieve combinaties gemaakt zoals bijvoorbeeld "November the 2nd" wordt gecombineerd tot "naam van de maand + the + numerieke uitdrukking", "22 March" tot "numerieke uitdrukking + naam van de maand, "the 23rd May" tot "the + numerieke uitdrukking + naam 10 van de maand", "the 11th of June" tot "the + numerieke uitdrukking + of + naam van de maand", "'86, Jan. 27. Mon." tot "jaar +, + mand en dag +, + kalenderdag", "zondag, 26, jan., 1986" voor "kalenderdag +, + maand en dag +, + jaar", "11:30 a.m." tot "numerieke waarde: numerieke waarde + a.m. (of p.m.)" of "naam van de maand + jaar", "naam van de 15 maand + of + jaar", enz.40 In this case, a unit group of the words! Ice reference 8702359 Λ I · I Ϋ 79 unit, such as a word, is actuated by joining the morpheme information together into a unit using the local analysis rules. For example, "name of the month", "numeric expression" are added together to "name of the month + numeric expression", i.e. "Oct." and "18" are grouped into "Oct.18". In addition, collective combinations are made such as "November the 2nd" is combined to "name of the month + the + numeric expression", "22 March" to "numeric expression + name of the month," the 23rd May "to" the + numeric expression + name 10 of the month "," the 11th of June "to" the + numeric expression + or + name of the month "," '86, Jan. 27. Mon. "to" year +, + basket and day +, + calendar day "," sunday, 26, jan. 1986 "for" calendar day +, + month and day +, + year "," 11:30 am "to" numeric value: numeric value + am (or pm) "or" name of month + year "," name of 15 month + or + year ", etc.

De lokale analysebewerking wordt uitgevoerd in een initiële waarde instellende sectie 7110, een overeenstemming afleidende sectie 7112, een eenheiduitsnijdingssectie 7114, een morfeemverwerkingsinformatie leverende sectie 7118, de opzoeksecties 7116 en 7120 en de verwerkings-20 secties 7122 en 7124 alsmede een overeenstemmingstabel 7128 die een morfeemverwerkingsindicatietabel bevat hetgeen een onderscheidende re-ferentietabel is waarmee wordt herkend dat een opeenvolging van eenheden numerieke getallen en tijdsfactoren bevat zoals getoond is in figuur 57 voor een samengestelde eenheid met tijdsfactoren die aan een 25 zekere regel voldoen. De initiële waarde instellende sectie 7110 stelt de initiële waarde van een teller n in waarmee het aantal woordenlijst-verwijzingseenheden wordt geteld waarmee overeenstemming wordt bereikt bij het achtereenvolgens opzoeken van woordenlijstverwijzingseenheden als eenheidsgroep zoals in het bovenstaande is beschreven in de over-30 eenstemming vaststel!ende sectie 7112.The local analysis operation is performed in an initial value setting section 7110, a matching derivative section 7112, a unit cut-out section 7114, a morpheme processing information supplying section 7118, the lookup sections 7116 and 7120 and the processing sections 7122 and 7124 as well as a match table 7128 indicating a morpheme processing indication which is a distinctive reference table which recognizes that a sequence of units contains numerical numbers and time factors as shown in Fig. 57 for a composite unit of time factors that satisfy a certain rule. The initial value setting section 7110 sets the initial value of a counter n which counts the number of vocabulary reference units that are matched in successively looking up vocabulary reference units as a unit group as described above in the matching agreement. section 7112.

De aanpassing indicerende sectie 7112 zoekt in de overeenstemmingstabel 7128 voor elk van de woorden!ijstverwijzingseenheden teneinde een overeenstemming te vinden. De eenheiduitsnijdingssectie 7114 onderscheidt de woordenlijstverwijzingseenheden, aangeduid als "p", aan 35 het einde van de zoekprocedure in de woordenlijst in de woordenlijst-zoeksectie 7106 van de woordenlijstverwijzingseenheden die de karakter-array's vormen nadat de woordenlijstverwijzingseenheden zijn voltooid in het woorden!ijstzoekproces met de teller n.The matching indicating section 7112 searches the matching table 7128 for each of the words reference reference units to find a match. The unit excision section 7114 distinguishes the glossary reference units, designated "p", at the end of the glossary search in the glossary search section 7106 from the glossary reference units that form the character arrays after the glossary reference units are completed in the glossary search process using the glossary search units. counter n.

De opzoeksectie 7116 is een functionele sectie met een soortgelij-40 ke functie als die van de woorden!ijstopzoeksectie 7106, waarmee de 8702 359 80 ,ι a < * woordenlijst 7018 wordt aangesproken om daaruit woordenlijstinformatie te halen gebaseerd op de in de eenheidsuitsnijdingssectie 7114 uitgesneden karakterarray welke informatie wordt overgedragen naar de mor-feemverwerkingsinformatie verschaffende sectie 7118. De morfeemverwer-5 kingsinformatie verschaffende sectie 7118 heeft dezelfde functie als de morfeemverwerkingsinformatie verschaffende sectie 7108, waarin een verdere specifieke informatie wordt toegevoegd aan elk ervan die wordt herkend als een gewoon getal of als een tijdsfactor in de opzoeksectie 7116.The look-up section 7116 is a functional section with a similar function to that of the words ice-look-up section 7106, which addresses the 8702 359 80, woorden a <* glossary 7018 to extract glossary information therefrom based on the cut-out in the unit cut-out section 7114. character array which information is transferred to the morpheme processing information providing section 7118. The morpheme processing information providing section 7118 has the same function as the morpheme processing information providing section 7108, in which further specific information is added to each of which is recognized as an ordinary number or as a time factor in the lookup section 7116.

10 De opzoeksectie 7120 en de verwerkingssecties 7122 en 7124 voegen een opeenvolging van woorden!ijstverwijzingseenheden samen tot aan "p+n" verkregen uit de overeenstemming opzoekende sectie 7112 via de verwerking in de morfeemverwerkingsinformatie opleverende sectie 7118 in een enkele woordenlijstreferentie-eenheid. Daarna wordt het resells taat opgeslagen in de woorden!ijstinformatiebehoudtabel 7126 die een buffer vormt voor het opslaan van de tijdens het opzoeken verkregen woorden!ijst informatie.The look-up section 7120 and the processing sections 7122 and 7124 merge a sequence of word list reference units up to "p + n" obtained from the match-looking section 7112 via the processing in the morpheme processing information generating section 7118 in a single word reference unit. Thereafter, the record is stored in the words list information retention table 7126 which forms a buffer for storing the words list information obtained during the search.

Het resultaat van de morfologische analyse wordt overgedragen vanaf de woorden!ijstinformatiebehoudtabel 7126 naar de syntax analysesec-20 tie I 7020.The result of the morphological analysis is transferred from the word list retention table 7126 to the syntax analysis section I 7020.

In het volgende wordt een verklaring gegeven van de morfeemverwerkingsinformatie in overeenstemming met de onderhavige uitvinding waarbij wordt verwezen naar de stroomschema's in de figuren 55A en 55B.In the following, an explanation is given of the morpheme processing information in accordance with the present invention referring to the flow charts in Figures 55A and 55B.

Verondersteld wordt bijvoorbeeld dat de volgende karakterarray 25 wordt ingevoerd in de invoerverwerkingssectie 7100 (7300). Invoerkarak-terarray: "..26 jan., '80 he.."For example, it is believed that the following character array 25 is input into the input processing section 7100 (7300). Input character array: "..26 Jan., '80 hey .."

De eenheiduitsnijdingssectie 7102 zorgt voor het indelen van de ingangskarakterarray in woordenlijstverwijzingseenheden voor het aanspreken van de woordenlijst 7018 (7302). "26" in de ingangskarakterar-30 ray wordt als eenheid uitgesneden voor gebruik als woordenlijstverwij-zingseenheid. Er wordt beoordeeld of het uitsnijden van woordenlijstverwijzingseenheden is beëindigd of niet voor deze ingangskarakterarray en indien het is beëindigd dan wordt de operatie beëindigd (7304) terwijl indien het nog niet is beëindigd het stroomschema verder gaat met 35 de volgende stap 7306.The unit excision section 7102 divides the input character array into glossary reference units to address the glossary 7018 (7302). "26" in the input character array is cut out as a unit for use as a glossary reference unit. It is judged whether the cutout of glossary reference units has been terminated or not for this input character array and if it has been terminated then the operation is terminated (7304) while if it is not yet terminated the flowchart proceeds to the next step 7306.

De woordenlijst 718 wordt aangesproken op "26" in de ingangskarakterarray en gevonden wordt woordenlijstinformatie die aangeeft dat "26" een "grondtal", grondtal is (7306). Vervolgens wordt een morfeemverwerkingsinformatie aangevende dat "grondtal, grondtal" een morfeemeigen-40 schap heeft, dat wil zeggen bestaat uit een opeenvolging van numerieke 8702359 81 L Λ- r r getallen, wordt behandeld als een gegroepeerd grondtal (7308). Beoordeeld wordt of de groep die de woordenlijstinformatie heeft opgeleverd, is voorzien van de morfeemverwerkingsinformatie of niet in stap 7308 (7310). Als ze vervolgens wordt verschaft dan gaat het stroomschema 5 verder naar stap 7315 voor verdere verwerking gebaseerd op de lokale analyseregets terwijl bij niet verschaffen de groep wordt vastgelegd in de woordenlijstinformatiebehoudtabel 7126 (7312) en het stroomschema terugkeert naar stap 7302. "26" wordt derhalve, voorzien van de mor-feembewerkingsinformatie, toegevoerd naar stap 7314.The glossary 718 is addressed at "26" in the input character array and is found glossary information indicating that "26" is a "base" base (7306). Next, a morpheme processing information indicating that "base, base" has a morpheme property, i.e., consists of a sequence of numeric 8702359 81 L r r numbers, is treated as a grouped base (7308). It is judged whether the group that provided the glossary information includes the morpheme processing information or not in step 7308 (7310). If it is then provided, the flowchart 5 proceeds to step 7315 for further processing based on the local analysis regimes while on non-provisioning, the group is recorded in the glossary information retention table 7126 (7312) and the flowchart returns to step 7302. Therefore, "26" becomes , provided with the morph editing information, supplied to step 7314.

10 De bewerking in stap 7314 wordt uitgevoerd in overeenstemming met het stroomschema dat getoond is in figuur 55B.The operation in step 7314 is performed in accordance with the flow chart shown in Figure 55B.

Allereerst wordt een aanvangswaarde "0" ingesteld in de teller n waarmee het aantal overeenstemmende woorden!ijstverwijzingseenheden wordt geteld wanneer de woordenlijstverwijzingseenheden worden gebruikt 15 door de overeenstemmingopzoeksectie 7112 (7410). Omdat verder de woordenl ijstverwijzingseenheid, gecompleteerd met het zoeken in de woordenlijst door de woordenlijstopzoeksectie 7016, is ingesteld op "p", wordt de overeenstemmingstabel 7128 aangesproken door de p+n_de (n=0) woordenl ijstverwijzingseenheid, dat wil zeggen "26" vanuit de overeenstem-20 mingopzoeksectie 7112 (7412). Omdat "26" is voorzien van de morfeemver-werkingsinformatie aangevende dat het hier gaat om een grondtal in stap 7308 en omdat deze combinaties "elk voorzien zijn van een grondtal" aan het begin ervan aanwezig zijn en na het tweede gegeven aanwezig zijn voor combinatie in de overeenstemmingstabel 7128 (zie ook figuur 57) 25 wordt de woordenlijstreferentie-eenheid "26" eraan gelijk en wordt dus overeenstemming gevonden met de informatie in de overeenstemmingstabel 7128. In dit geval wordt de overeenstemming verkregen voor Ms-Me bij het instellen van de tweede overeenstemmingsconfiguratie als "Ms", waarbij de laatste gegevens van de combinatie met "grondtal" aan het 30 begin ervan gelijk is "Me" in de overeenstemmingstabel 7128.First, an initial value "0" is set in the counter n which counts the number of matching glossary reference units when the glossary reference units are used by the match lookup section 7112 (7410). Furthermore, because the glossary reference unit, completed with the glossary search by the glossary lookup section 7016, is set to "p", the match table 7128 is addressed by the p + n_de (n = 0) glossary reference unit, that is, "26" from the matching lookup section 7112 (7412). Because "26" is provided with the morpheme processing information indicating that it is a base in step 7308 and because these combinations "each have a base" are present at the beginning of it and are present after the second entry for combination in the match table 7128 (see also figure 57) 25, the word reference unit "26" becomes equal and thus match is found with the information in the match table 7128. In this case, match is obtained for Ms-Me when setting the second matching configuration as "Ms", where the latest data of the combination with "base" at the beginning is equal to "Me" in the matching table 7128.

Gebaseerd op het resultaat van de overeenstemming in de overeenstemmingstabel 7128 bij de p+n-de (n=0) woordenlijstverwijzingseenheid wordt de overeenstemmingstoestand beoordeeld (7414) en indien de overeenstemming wordt beslist gaat het stroomschema verder met stap 7416, 35 terwijl indien er geen overeenstemming wordt vastgesteld het stroomschema verder gaat naar stap 7424.Based on the result of the match in the match table 7128 at the p + nth (n = 0) glossary reference unit, the match state is judged (7414) and if the match is decided, the flowchart proceeds to step 7416, 35 while if no agreement is established, the flowchart proceeds to step 7424.

Als overeenstemming is vastgesteld dan wordt een "1" ingesteld in de teller n voor het uitvoeren van een uitsnijding door de woorden-lijstuitsnijdingseenheid bij p+1 (n=l) in de ingangskarakterarray. De 40 uitsnijding wordt op dezelfde wijze uitgevoerd als in stap 7302. De 8702358 » * 82 woordenlijst 7018 wordt aangesproken op "jan.," hetgeen als woorden-lijstverwijzingseenheid is uitgesneden uit de karakterarray volgend op "26" teneinde de morfeemverwerkingsinformatie te verschaffen (7420, 7422). Deze bewerking wordt op dezelfde wijze uitgevoerd als in de 5 stappen 7306 en 7308.If match is determined then a "1" is set in the counter n to perform an excision by the vocabulary excision unit at p + 1 (n = 1) in the input character array. The cutout is performed in the same manner as in step 7302. The 8702358 * * 82 glossary 7018 is addressed on "Jan.", which is cut as a glossary reference unit from the character array following "26" to provide the morpheme processing information (7420 7422). This operation is performed in the same manner as in steps 7306 and 7308.

Door de procedure voor de stappen 7412 tot en met 7422 als in het bovenstaande te herhalen wordt het stroomschema doorlopen tot “26 jan., '80 he“. Omdat “he" echter geen overeenstemming oproept bij het zoeken in de overeenstemmingstabel 7128 in stap 7412 gaat het stroomschema van 10 stap 7414 verder naar stap 7424. Dit betekent dat, terwijl de gegevens tot aan "26 jan., *80" zijn vertaald in "grondtal, maand, jaar" in de aanpassingstabel 7128, er geen aanpassing meer wordt gevonden voor "26, jan., '80 he".By repeating the procedure for steps 7412 through 7422 as above, the flow chart proceeds to "Jan. 26," 80 he ". However, since “he” does not call for a match when looking in the match table 7128 in step 7412, the flowchart proceeds from step 7414 to step 7424. This means that while the data up to “Jan 26, * 80” has been translated into "base, month, year" in the adjustment table 7128, no adjustment is found for "26, Jan., '80 he".

Als verder de zin in de ingangskarakterarray is beëindigd bij 15 voorbeeld met "26 jan., '80", dat wil zeggen indien er geen verdere uitsnijding meer kan worden gemaakt van een volgende woorden!ijstver-wijzingseenheid, dan gaat het stroomschema vanaf stap 7418 verder met stap 7424.Further, if the sentence in the input character array has ended, for example, with "Jan. 26, '80," that is, if no further excision can be made from a subsequent words list reference unit, then the flow chart proceeds from step 7418. proceed to step 7424.

Indien in stap 7414 geen aanpassing wordt herkend dan wordt beoor-20 deeld of de inhoud van de teller n niet groter is dan 1 of niet (7424) en indien ze niet groter is dan 1 dan wordt ze geregistreerd als enkelvoudige woorden!ijstverwijzingseenheid in de woordenlijstinformatiebe-houdtabel 7126 (7434).If no adaptation is recognized in step 7414, it is judged whether the contents of the counter n is not greater than 1 or not (7424) and if it is not greater than 1, it is registered as single words! glossary information retention table 7126 (7434).

Indien ze niet kleiner is dan 1 dan wordt de aanpassingsprocedure 25 uitgevoerd met p+n (n=3) dat wil zeggen "he" in "26 jan., '80 he" wordt beschouwd als "einde van de zin" waarmee het einde van de zinsnede wordt geïndiceerd (7426, 7428). Als geen overeenstemming wordt gevonden dan gaat het stroomschema verder met stap 7434. Als overeenstemming wordt gevonden dan wordt "26 jan., '80", hetgeen p-(p+n-l) is voor de 30 woorden!ijstverwijzingseenheid samengevoegd in overeenstemming met het resultaat van de samenvoeging corresponderend met Ms in de overeenstemmingstabel 7128 en het resultaat wordt geregistreerd in de woordenli jstinformatiebehoudtabel 7126 (7430).If it is not less than 1, then the adjustment procedure 25 is performed with p + n (n = 3) ie "he" in "Jan 26, '80 he" is considered "end of sentence" with which the end of the phrase is indicated (7426, 7428). If no match is found, the flowchart proceeds to step 7434. If match is found, "Jan 26, '80", which is p- (p + nl) for the 30 words list reference unit, is merged according to the result. of the concatenation corresponding to Ms in the match table 7128 and the result is recorded in the glossary information retention table 7126 (7430).

Daarna wordt ervan uitgegaan dat de woorden!ijstverwijzingseenhe-35 den allemaal zijn behandeld tot de (p+l-n)-de eenheid en wordt (p+n-1) teruggesteld op "p" (8432).Thereafter, it is assumed that the words ice reference units have all been treated to the (p + 1-n) -th unit and (p + n-1) is reset to "p" (8432).

In het volgende zal de achtste uitvoeringsvorm volgens de onderhavige uitvinding worden verklaard.In the following, the eighth embodiment of the present invention will be explained.

Figuur 63 illustreert de gehele structuur van de achtste uitvoe-40 ringsvorm van de taal analyse inrichting volgens de onderhavige uitvin- 8 7 0 2 o i; & 4 *> 1 if 83 ding toegepast bij een apparaat voor het automatisch vertalen van Engels naar Japans.Figure 63 illustrates the entire structure of the eighth embodiment of the language analyzer according to the present invention; & 4 *> 1 if 83 thing applied to a device for automatic translation from English to Japanese.

In figuur 63 zijn getoond de invoersectie 8001, de Engelse tekst 8002, de voorredigeersectie 8003, de morfologische analysesectie 8004, 5 de structuuranalysesectie I 8005, een zinstructuuranalysesectie II 8006, de operatieweergeefsectie 8007, de woordenlijst 8008, het analy-seregelbestand 8009, de stuursectie 8010, de structuurtransformatiesec-tie 8011, de vertaalde zin genererende sectie 8012, de naredigeersectie 8013, de uitgangssectie 8014 en de Japanse zin 8015. De onderhavige 10 vertaal inrichting heeft, zoals geïllustreerd is in de figuur, de in-gangssectie 8001 via welke de naar het Japans te vertalen Engelse tekst 8002 wordt ingevoerd. De ingangssectie 8001 kan bijvoorbeeld voorzien zijn van een toetsenbord met karaktertoetsen zoals alfanumerieke toetsen en functietoetsen, een optische karakterlezer (OCR-lezer) voor het 15 lezen van de Engelse tekst en/of een bestandsgeheugeninrichting voor het lezen van Engelse tekst die geregistreerd is op een geheugenmedium zoals een magnetisch schijfmedium.In Figure 63 are shown the input section 8001, the English text 8002, the pre-editing section 8003, the morphological analysis section 8004, the structure analysis section I 8005, a sentence structure analysis section II 8006, the operation display section 8007, the glossary 8008, the analysis rule file 8009, the control section. 8010, the structure transform section 8011, the translated sentence generating section 8012, the post-digest section 8013, the output section 8014 and the Japanese sentence 8015. The present translation apparatus, as illustrated in the figure, has the input section 8001 through which the English text 8002 to be translated into Japanese is entered. Input section 8001 may include, for example, a keyboard with character keys such as alphanumeric keys and function keys, an optical character reader (OCR reader) for reading English text and / or a file memory device for reading English text registered on a memory medium such as a magnetic disk medium.

De via de ingangssectie 8001 ingevoerde tekst wordt gelezen in de voorredigeersectie 8003, waar een voorbehandeling voor de vertaling 20 wordt uitgevoerd. In dit geval wordt in hoofdzaak zinsherkenning en verwerking van onbekende woorden uitgevoerd. Deze functies vormen een deel van de morfologische analyse.The text input through the input section 8001 is read into the pre-editing section 8003, where translation pretreatment 20 is performed. In this case, sentence recognition and processing of unknown words are mainly performed. These functions are part of the morphological analysis.

De voorgeredigeerde Engelse gegevens worden tezamen met informatie, verkregen in de voorredigering, toegezonden aan de morfologische 25 analysesectie 8004. De sectie 8004 verdeelt de gegevens door verwijzing naar de woordenlijst 8008, analyseert de Engelse morfemen, voert diverse soorten klasseringen uit zoals bewerking van onbekende woorden, uitdrukkingen van eigen namen, tijdsuitdrukkingen en getallen, en voert bewerkingen uit op de gehele zin zoals additieve afvraging en herkennen 30 van standaard uitdrukkingen. De analyseregels zijn opgeslagen in het analyseregelbestand 8009.The pre-edited English data, along with information obtained in the pre-editing, is sent to the morphological analysis section 8004. The section 8004 distributes the data by reference to the glossary 8008, analyzes the English morphemes, performs various types of classifications such as editing of unknown words , expressions of own names, time expressions and numbers, and performs operations on the whole sentence such as additive interrogation and recognition of standard expressions. The analysis rules are stored in the analysis rule file 8009.

Na onderwerping aan de morfologische analyse worden de Engelse gegevens overgedragen tezamen met woordenlijstinformatie verkregen tijdens de morfolotische analyse naar de structuuranalysesectie I 8005. De 35 structuuranalysesectie I 8005 is een functionele sectie waarmee de op-pervlaktelaagstructuur van de zin wordt ontleed door een grammaticale regel los te laten op de Engelse gegevens en alle structurele mogelijkheden op te zoeken.After submission to the morphological analysis, the English data is transferred along with glossary information obtained during the morpholotic analysis to the structure analysis section I 8005. The structure analysis section I 8005 is a functional section that decomposes the surface layer structure of the sentence by loosening a grammatical line leave on the English data and look up all structural possibilities.

Na onderwerping aan de analyseprocedure in de structuuranalysesec-40 tie I 8005 worden de Engelse gegevens tezamen met de analyse!nformatie 8 7 o 2 3 5 ê 84 .» * if $ toegezonden aan structuuranalysesectie II 8006, waar een oplossing wordt geselecteerd uit het resultaat van de structuuranalyse van de oppervlakte! aag door de structuuranalysesectie I door het toepassen van een syntax analyse. Op deze wijze wordt een plausibele analysestructuur 5 van de Engelse beschrijving voorbereid om de structuur ervan op te bouwen. Deze analyseregels zijn ook opgeslagen in het analyseregelbestand 8009.After submission to the analysis procedure in the structure analysis section I 8005, the English data together with the analysis information 8 7 o 2 3 5 ê 84. » * if $ sent to structure analysis section II 8006, where a solution is selected from the result of the surface structure analysis! through the structure analysis section I by applying a syntax analysis. In this way, a plausible analysis structure 5 of the English description is prepared to build its structure. These analysis rules are also stored in the analysis rule file 8009.

De aan de structuuranalyse onderworpen Engelse gegevens worden als gegevens voor de analysestructuur toegevoerd aan een structuurtransfor-10 matiesectie 8011. De struetuurtransformatiesectie 8011 bereidt een corresponderende Japanse analysestructuur voor die een tussenstructuur vormt van de Engelse zin teneinde deze om te vormen naar een Japanse onderliggende structuur van waaruit de Japanse zin met gemak kan worden vertaald.The English data subjected to the structure analysis is fed as data for the analysis structure to a structure transformation section 8011. The structure transformation section 8011 prepares a corresponding Japanese analysis structure which forms an intermediate structure of the English sentence in order to convert it into a Japanese underlying structure of from which the Japanese sentence can be easily translated.

15 De gegevens voor de analysestructuur, die de Japanse onderliggende structuur indiceren en op deze wijze zijn getransformeerd worden toegezonden naar een vertaling genererende sectie 8012 waarin de vertaalde zin wordt gegenereerd. Dit is een functionele sectie voor het genereren van een Japanse zin vanuit de Japanse analysestructuur van de zin.The data for the analysis structure, indicating the Japanese underlying structure and transformed in this manner, is sent to a translation generating section 8012 in which the translated sentence is generated. This is a functional section for generating a Japanese sentence from the Japanese analysis structure of the sentence.

20 De op deze wijze vertaalde Japanse zinsgegevens, dat wil zeggen de gegevens voor de vertaalde zin, worden toegezonden aan de naredigeer-sectie 8013. De naredigeersectie 8013 modificeert de vertaalde gegevens door een opzoekproces in de woordenlijst 8008 terwijl gebruik wordt gemaakt van de informatie die werd gebruikt in de vertaal procedure ten-25 einde een meer natuurlijke Japanse zin te verkrijgen. De gegevens voor de Japanse zin worden overgedragen naar de uitgangssectie 8014 en als vertaalde Japanse zin 8015 via de uitgangssectie 8014 afgegeven. De uitgangssectie 8014 bevat bijvoorbeeld een afdrukeenheid, een weergeef-eenheid en/of een bestandsgeheugeninrichting zoals een inrichting met 30 een magnetisch schijfgeheugen.The Japanese phrase data thus translated, that is, the data for the translated sentence, is sent to the post-editing section 8013. The post-editing section 8013 modifies the translated data by a glossary 8008 look-up process using the information provided. was used in the translation procedure to obtain a more natural Japanese sense. The Japanese sentence data is transferred to output section 8014 and output as translated Japanese sentence 8015 through output section 8014. The output section 8014 includes, for example, a printing unit, a display unit and / or a file memory device such as a magnetic disk memory device.

Het stroomschema van de reeks van vertaalbewerkingen wordt bestuurd door een stuursectie 8010 die de besturing van de gehele inrichting uitvoert. De woordenlijst 8008 bevat in het geval van de geïllustreerde uitvoeringsvorm woordenlijstgegevens voor de Engelse en Japanse 35 woorden, waarin naast de eigenlijke vocabulair diverse informaties aanwezig zijn zoals onderlinge samenhang, dat wil zeggen woorden die samen optreden, betekenissen, enkelvoudige of meervoudige vormen, zinsdelen, enz. Verder bevat het analyseregelbestand 8009 regel gegevens voor de morfologische analyse en voor de syntax analyse van de Engelse zin.The flow chart of the series of translation operations is controlled by a control section 8010 which controls the entire device. The glossary 8008, in the case of the illustrated embodiment, contains glossary data for the English and Japanese 35 words, in which, in addition to the actual vocabulary, various information is present such as interrelation, i.e. words that occur together, meanings, singular or multiple forms, phrases, etc. Furthermore, the analysis rule file 8009 contains line data for the morphological analysis and for the syntax analysis of the English sentence.

40 De stuursectie 8010 is verbonden met de operatieweergeefsectie 8702358 85 < J.40 The control section 8010 is connected to the operation display section 8702358 85 <J.

T t 8007. De operatieweergeefsectie 8007 bevat operatietoetsen voor het verschaffen van diverse instructies vanaf een operateur aan de onderhavige inrichting, bijvoorbeeld een vertaal instruct!etoets, een cursor-toets, enz., een weergeefeenheid of indicatie-eenheid waarmee de inge-5 voerde Engelse zin, de Japanse zin als resultaat van de vertaling, tussenliggende gegevens zoals woordenlijstinformatie en diverse instructies aan de operateur zichtbaar kunnen worden gemaakt. De inrichting kan ook zodanig uitgevoerd zijn dat het grootste deel van deze opera-tieweergeeffuncties is gerealiseerd in een toetsenbord dat aanwezig is 10 in de ingangssectie 8001 of bij de weergeefeenheid die zich bevindt bij de uitgangsectie 8014.T t 8007. The operation display section 8007 includes operation keys for providing various instructions from an operator to the present device, for example, a translation instruction key, a cursor key, etc., a display unit or indicator unit with which the input is entered. English sentence, the Japanese sentence as a result of the translation, intermediate data such as glossary information and various instructions can be made visible to the operator. The device may also be configured such that the majority of these operation display functions are realized in a keyboard which is present in the input section 8001 or at the display unit located at the output section 8014.

De uitvoeringsvorm van de onderhavige uitvinding wordt, zoals in het bovenstaande is aangegeven, gebruikt als automatische vertaal inrichting en is zodanig uitgevoerd dat als een afgeleid woord aanwezig 15 is in de Engelse tekst 8002, een grammaticaal type, een semantisch type, een vertaald woord, enz. worden geschat afhankelijk van de omstandigheden waaronder dat afgeleide woord wordt herkend als zodanig door middel van een voorvoegsel of dergelijke, waarbij de betrouwbaarheid van het verkregen analyseresultaat van de vertaling wordt vergroot. Een 20 voorvoegselwoordenlijst wordt gebruikt om met de woordenlijstinformatie een schatting te maken voor onbekende woorden in de morfologische analyse. Er zijn drie soorten verwerking nodig, d.w.z. een bewerkingen voor het voorvoegsel, het voorvoegsel voor het achtervoegsel en een geschatte verwerking van het achtervoegsel. De soort gegevens omvat ech-25 ter in hoofdzaak twee typen, d.w.z. voorvoegselschattingsgegevens en achtervoegselschattingsgegevens.The embodiment of the present invention, as indicated above, is used as an automatic translation device and is designed such that if a derived word is present in the English text 8002, a grammatical type, a semantic type, a translated word, etc. are estimated depending on the circumstances under which that derivative word is recognized as such by a prefix or the like, thereby increasing the reliability of the translation analysis result obtained. A prefix dictionary is used to estimate the glossary information for unknown words in the morphological analysis. Three types of processing are required, i.e. an operations for the prefix, the prefix for the suffix, and an estimated processing of the suffix. However, the type of data essentially comprises two types, i.e. prefix estimate data and suffix estimate data.

Allereerst zal een verklaring worden gegeven van de boven beschreven voorvoegselschattingsgegevens en achtervoegselschattingsgegevens.First, an explanation will be given of the prefix estimate data and suffix estimate data described above.

(1) Voorvoegselschattingsgegevens.(1) Prefix estimate data.

30 Als het voorvoegsel gedeelte van een niet in de woordenlijst geregistreerd woord overeenstemt met hetgeen hierna wordt beschreven en het resterende gedeelte aanwezig is in de woordenlijst dan wordt het woord behandeld In overeenstemming met de in de woordenlijst aanwezige stam ervan. Het is mogelijk om in de woordenlijst een intern kenmerk toe te 35 voegen aan de reeks van oorspronkelijke interne kenmerken en een Japans voorvoegsel toe te voegen aan het oorspronkelijke vertaalde woord. Voor het ingangswoord "electrochemical" bijvoorbeeld kan een ingangsgegeven in de woorden!ijstinformatiebehoudtabel "electrochemical" worden gevormd in overeenstemming met de woordenlijstgegevens voor het voorvoeg-40 selgegeven "electro" en voor het woordenlijsingangsgegeven "chemical" 8702358 86 y ♦ * ï zoals in het onderstaande is getoond: voorvoegsel ingangsgegeven woordenlijsingangsgegeven electro chemical 5 ingangswoord ingangsgegeven in de woordenlijst informatiebehoudtabel electrochemi cal electrochemi calIf the prefix portion of a word not registered in the glossary matches what is described below and the remainder is contained in the glossary, the word is treated in accordance with its stem in the glossary. It is possible to add an internal attribute in the dictionary to the series of original internal attributes and to add a Japanese prefix to the original translated word. For example, for the input word "electrochemical", an entry in the words list information retention table "electrochemical" may be generated in accordance with the glossary data for the prefix "electro" and for the word list entry "chemical" 8702358 86 y ♦ * ï as in the the following is shown: prefix input word vocabulary input data electro chemical 5 input word input data in the glossary information retention table electrochemical electrochemical

Opgemerkt wordt dat het ingangsgegeven in de woorden!ijstinforma-10 tiebehoudtabel alle woorden!ijstinformaties overneemt waarvan de stam van de woorden overeenstemming vertoont.It is noted that the entry in the words ice information retention table inherits all the words ice information of which the word stem is similar.

(2) Achtervoegselschattingsgegevens(2) Suffix estimate data

Als het achtervoegsel gedeelte van een niet in de woordenlijst geregistreerd woord overeenstemt met hetgeen hierna wordt beschreven en 15 het resterende gedeelte is aanwezig in de woordenlijst dan wordt dit woord geregistreerd door een nieuw woordenlijstgegeven te verschaffen in overeenstemming met de informatie beschreven in de achtervoegsel-woordenlijst. In dit geval wordt het eerst vertaalde woord in de woorden! ijstgeg evens, corresponderend met het gedeelte van het woord opge-20 haald en gebruikt als vertaald woord in de nieuwe woordenlijstgege-vens.If the suffix portion of a word not registered in the glossary matches what is described below and the remaining portion is contained in the glossary, then this word is registered by providing a new glossary entry according to the information described in the suffix glossary . In this case, the first translated word in the words! Ice data, corresponding to the portion of the word retrieved and used as the translated word in the new word data.

Voor het ingangswoord "controler" bijvoorbeeld wordt een ingangsgegeven "controler" geregistreerd in de woorden!ijstinformatiebehoudtabel gebaseerd op het achtervoegsel ingangsgegeven "-(e)r" en op het 25 wóórden!ijstingangsgegeven "control".For example, for the input word "controler", an input data "controler" is registered in the words! List information retention table based on the suffix input data "- (e) r" and on the word "entry" data entry "control".

Achtervoegseli ngangsgegeven woorden]ijsti ngangsgegeven (werkwoord)-(e)r control zeifstandi g naamwoord werkwoord 30 ingangswoord ingangsgegeven in de woorden!ijsti nformati ebehoudtabel controler controler zeifstandi g naamwoord 35 Vervolgens zal een verklaring worden gegeven van de verwerking van een afgeleide en de verwerking van een onbekend woord.Suffix given words] ice start data (verb) - (e) r control sift noun verb 30 input word input data in the words! Ice information retention table controler check noun 35 Next, an explanation will be given of the processing of a derivative and the processing from an unknown word.

(1) Voor een niet in de woordenlijst geregistreerd woord wordt, indien een voorvoegsel of achtervoegsel aanwezig is aan het begin of het einde van een woord en indien het resterende gedeelte van het woord 40 is geregistreerd in de woordenlijst, de Engelse zinsdeel informatie, de 8702359 87 t -t T 7 interne eigenschappen en het Japanse vertaalde woord gesynthetiseerd gebaseerd op de woordenlijstinformatie en de achtervoegsel informatie.(1) For a word not registered in the word list, if a prefix or suffix is present at the beginning or end of a word and if the remainder of the word 40 is registered in the word list, the English phrase information will be 8702359 87 t-t T 7 internal properties and the Japanese translated word synthesized based on the glossary information and the suffix information.

(2) Het voorvoegsel en het achtervoegsel worden resp. in een reeks opgesomd en kunnen onafhankelijk van een programma worden geredigeerd.(2) The prefix and suffix are resp. listed in a series and can be edited independently of a program.

5 (3) Allereerst worden mogelijkheden voor het voorvoegsel gepro beerd en indien dit faalt worden de mogelijkheden voor het achtervoegsel geprobeerd. Indien beiden aanwezig zijn wordt geen poging uitgevoerd.5 (3) First, possibilities for the prefix are tried and if this fails, the possibilities for the suffix are tried. If both are present, no attempt is made.

(4) Indien een woord geen succes heeft in een schattingspoging 10 wordt een verwerking van het eindgedeelte uitgevoerd als onbekend woord.(4) If a word is unsuccessful in an estimation attempt 10, processing of the end portion is performed as an unknown word.

In het volgende zal meer in het bijzonder een verklaring worden gegeven van de achtste uitvoeringsvorm met verwijzing naar de figuren.More specifically, the following will explain the eighth embodiment with reference to the figures.

Figuur 58 is een blokschema ter verklaring van een uitvoeringsvorm 15 van de onderhavige uitvinding. In de figuur zijn getoond de ingangsver-werkingssectie 8020, de eenheiduitsnijdingssectie 8021, de begrenzers-tabel 8022, de afgeleide-verwerkingssectie 8023, de referentiewoorden-1 ijst 8024 en de woordenlijstinformatiebehoudtabel 8025. Allereerst wordt een Engelse zin ingelezen in de ingangsverwerkingssectie vanaf 20 een invoereenheid omvattende een invoerdocumentenbestand, een toetsenbord, een OCR-inrichting, enz. Daarna worden de woordenlijstverwij-zingseenheden uitgesneden in de woordenlijstverwijzingseenheiduitsnij-dingssectie waarbij telkens verwezen wordt naar de begrenzerstabel en, indien het einde nog niet bereikt is, wordt een zoekprocedure in de 25 woordenlijst uitgevoerd gebruikmakend van de referentiewoordenlijst.Figure 58 is a block diagram for explaining an embodiment 15 of the present invention. In the figure, the input processing section 8020, the unit cutting section 8021, the limiter table 8022, the derivative processing section 8023, the reference words 1 list 8024 and the glossary information retention table 8025 are shown. First, an English sentence is read in from the input 20 input unit comprising an input document file, a keyboard, an OCR device, etc. Then, the glossary reference units are cut out in the glossary reference unit excision section, each time referring to the limiter table and, if the end is not yet reached, a search is performed in the 25 glossary performed using the reference glossary.

Als er als resultaat van de zoekprocedure een ingangsgegeven wordt gevonden dan wordt het zoekprocedureresultaat geregistreerd in de woordenl ijstinformatiebehoudtabel, terwijl de verwerking voor een afgeleide wordt uitgevoerd indien er geen ingangsgegeven wordt gevonden.If an entry is found as a result of the search procedure, the search procedure result is recorded in the glossary information retention table, while processing for a derivative is performed if no entry is found.

30 Figuur 59 is een blokschema voor de verklaring van een uitvoeringsvorm van de afgeleide-verwerking door middel van een voorvoegsel.Figure 59 is a block diagram for explaining an embodiment of the derivative processing using a prefix.

In de figuur zijn getoond de overeenstemmingssectie 8030 tussen het be-gingedeelte en de voorvoegselwoordenlijst, de voorvoegselwoordenlijst 8031, een woordenlijstopzoeksectie 8032 voor zoekprocedures in de woor-35 denlijst met uitzondering van het voorvoegsel gedeelte, een overeenstemmingssectie 8033 voor het zinsdeel in de voorvoegselwoordenlijst en het zinsdeel in het ingangsgegeven, een woordenlijstinformatievoorberei-dingssectie 8034 op grond van de voorvoegsel schatting, en een afgeleide-verwerkingssectie 8035 op grond van het voorvoegsel.The figure shows the matching section 8030 between the prefix and the prefix dictionary, the prefix dictionary 8031, a glossary lookup section 8032 for word search except the prefix portion, a match section 8033 for the prefix dictionary and the prefix dictionary. phrase in the entry, a glossary information preparation section 8034 based on the prefix estimate, and a derivative processing section 8035 based on the prefix.

40 Figuur 60 is een blokschema waarin een uitvoeringsvorm wordt geil- 8702359 88 .» £ £ ï lustreerd van de afgeleide-verwerking door middel van het achtervoegsel. In de figuur zijn getoond de overeenstemmingssectie 8040 tussen het eindgedeelte en de achtervoegselwoordenlijst, de achtervoegsel woorden! ijst 8041, een verwerkingssectie 8042 voor geheel onbekende niet 5 geregistreerde woorden, een woordenlijstopzoeksectie 8043 voor zoekprocedures in de woordenlijst op een gedeelte met uitzondering van het achtervoegsel deel, een overeenstemmingssectie 8044 voor overeenstemming tussen het zinsdeel van de stam en het zinsdeel van het ingangsgegeven in het achtervoegsel, een woordenlijstinformatie voorbereidende verwer-10 kingssectie 8045 op grond van de achtervoegsel schatting, een woordenlijstopzoeksectie 8046 voor het uitvoeren van de woorden!ijstopzoekpro-cedure door de stamverandering in het achtervoegsel toe te voegen aan het gedeelte met uitzondering van het achtervoegsel deel, en een verwerkingssectie 8047 voor verwerking van niet geregistreerde woorden op 15 grond van de achtervoegsel schatting. Allereerst wordt gezocht naar overeenstemming tussen het eindgedeelte en de achtervoegselwoordenlijst. Indien geen overeenstemming wordt gevonden dan wordt het woord verwerkt als een geheel onbekend niet geregistreerd woord terwijl indien wel overeenstemming wordt gevonden een zoekprocedure in de woor-20 den!ijst wordt uitgevoerd voor het gedeelte met uitzondering van het achtervoegsel. Als er als resultaat van de woordenlijstzoekprocedure geen ingangsgegeven wordt gevonden dan wordt een woordenlijstzoekprocedure uitgevoerd waarbij de stamverandering in de achtervoegselwoordenlijst wordt toegevoegd aan het gedeelte met uitzondering van het ach-25 tervoegsel. Indien als resultaat daarvan geen ingangsgegeven wordt gevonden dan wordt de verwerking voor een niet geregistreerd woord uitge-voerd via de achtervoegsel schatting. Als er anderzijds wel een ingangsgegeven wordt gevonden dan wordt gezocht naar overeenstemming tussen het zinsdeel van het ingangsgegeven en het zinsdeel van de stam in de 30 achtervoegselwoordenlijst op dezelfde wijze als in het geval waarin een ingangsgegeven aanwezig is als resultaat van de woordenlijstzoekprocedure voor het gedeelte met uitzondering van het achtervoegsel. Als er overeenstemming wordt bereikt dan wordt de woorden!ijstinformatievoor-bereidingsverwerking uitgevoerd door middel van de achtervoegsel schat-35 ting, terwijl indien er geen overeenstemming wordt gevonden de verwerking voor een niet geregistreerd woord wordt uitgevoerd via de achtervoegsel schatting.40 Figure 60 is a block diagram illustrating an embodiment. 8702359 88. » £ £ ï illustrates the derivative processing by means of the suffix. In the figure, the matching section 8040 between the end portion and the suffix glossary, the suffix words! Ice 8041, a processing section 8042 for completely unknown not registered words, a glossary lookup section 8043 for glossary search on a portion excluding the suffix portion, a matching section 8044 for correspondence between the phrase of the stem and the phrase of the entry in the suffix, a glossary information preparatory processing section 8045 from the suffix estimate, a glossary lookup section 8046 for performing the words list lookup procedure by adding the stem change in the suffix to the portion except the suffix portion, and a processing section 8047 for processing unregistered words based on the suffix estimate. First, a search is made for correspondence between the end part and the suffix dictionary. If no match is found, the word is processed as a completely unknown unregistered word, while if match is found, a word search is performed for the part except the suffix. If no entry is found as a result of the glossary search, then a glossary search is performed in which the stem change in the suffix glossary is added to the part except the suffix. If no entry is found as a result, processing for an unregistered word is performed via the suffix estimate. If, on the other hand, an entry is found, then a match is sought between the phrase of the entry and the phrase of the stem in the suffix dictionary in the same manner as when an entry is present as a result of the glossary search for the part containing exception of the suffix. If an agreement is reached, then the word list information preparation processing is performed by the suffix estimate, while if no agreement is found, the processing for an unregistered word is performed by the suffix estimate.

Figuur 61 toont een blokschema met details van de gehele structuur verkregen door synthetisering van de delen uit de figuren 58, 59 en 60 40 en figuur 62 illustreert details van de sectie 8042 voor verwerking van 8702359 89 < ··* T 1 geheel onbekende niet geregistreerde woorden, getoond in figuur 61. Het verwerkingsproces voor geheel niet geregistreerde woorden 8042, getoond in figuur 61, omvat een verwerkingssectie 8050 voor geschatte informa-tievoorbereiding voor een zelfstandig naamwoord en een sectie 8051 voor 5 geschatte informatievoorbereiding voor een werkwoord. Omdat echter de delen in de figuren 61 en 62 al meer in detail zijn beschreven zullen verdere verduidelijkingen ervan worden weggelaten.Figure 61 shows a block diagram showing details of the whole structure obtained by synthesizing the parts of Figures 58, 59 and 60 40 and Figure 62 illustrating details of section 8042 for processing 8702359 89 <·· * T 1 completely unknown unregistered words shown in Figure 61. The processing for completely unregistered words 8042 shown in Figure 61 includes a processing section 8050 for estimated information preparation for a noun and a section 8051 for estimated information preparation for a verb. However, since the parts in Figures 61 and 62 have already been described in more detail, further clarifications thereof will be omitted.

De negende uitvoeringsvorm van de inrichting volgens de uitvinding zal nu worden beschreven.The ninth embodiment of the device according to the invention will now be described.

10 Figuur 64 illustreert de gehele structuur van de negende uitvoe ringsvorm volgens de onderhavige uitvinding toegepast bij een inrichting voor automatisch vertalen van Engels naar Japans.Figure 64 illustrates the entire structure of the ninth embodiment of the present invention used in an automatic translation device from English to Japanese.

Deze uitvoeringsvorm heeft een invoersectie 9010 via welke een naar het Japans te vertalen Engelse tekst 9012 wordt ingevoerd. De in-15 voersectie 9010 kan voorzien zijn van een toetsenbord met karaktertoet-sen zoals alfanumerieke toetsen of functietoetsen, een optische karak-terlezer (OCR-lezer) voor een op papier geregistreerde Engelse tekst en/of een bestandsgeheugeninrichting voor het lezen van Engelse tekst die geregistreerd is op een geheugenmedium, zoals een magnetisch 20 schijfgeheugen.This embodiment has an input section 9010 through which an English text 9012 to be translated into Japanese is input. The input-15 input section 9010 may include a keyboard with character keys such as alphanumeric keys or function keys, an optical character reader (OCR reader) for paper-recorded English text and / or a file memory device for reading English text recorded on a memory medium, such as a magnetic disk memory.

De Engelse tekst, ingevoerd via de ingangssectie 9010, wordt ingelezen in een voorredigeersectie 9014 waarin een voorbehandeling voor de vertaling wordt uitgevoerd. In dit geval worden hoofdzakelijk zinsher-kenning en onbekende woordverwerking uitgevoerd. Deze functies vormen 25 een deel van de morfologische analyse.The English text, entered through input section 9010, is read into a pre-editing section 9014 in which translation pretreatment is performed. In this case, mainly sentence recognition and unknown word processing are performed. These functions are part of the morphological analysis.

De voorgeredigeerde Engelse gegevens worden, tezamen met informatie verkregen in de voorredigering, overgedragen aan een morfologische analysesectie 9016. De sectie 9016 verdeeld de zin met behulp van de woordenlijst 9018, ontleedt de Engelse morfemen, voert diverse klassi-30 ficaties uit zoals verwerking van onbekende woorden, eigennamen, tijds-uitdrukkingen, getalsuitdrukkingen, enz. en voert ook bewerkingen uit op de gehele zin zoals het zoeken naar gezegdes en standaard uitdrukkingen. De morfologische analyseregels zijn aanwezig in het analysere-gelbestand 9036.The pre-edited English data, along with information obtained in the pre-editing, is transferred to a morphological analysis section 9016. The section 9016 divides the sentence using the glossary 9018, parses the English morphemes, performs various classifications such as processing unknown words, proper nouns, time expressions, number expressions, etc. and also performs operations on the entire sentence such as searching for sayings and standard expressions. The morphological analysis rules are contained in the analyzer gel file 9036.

35 De aan de morfologische analyse onderworpen Engelse gegevens wor den tezamen met de uit de analyse verkregen woorden!ijstinformatie overgedragen aan een analysesectie I 9020. De analysesectie I 9020 is in deze uitveoringsvorm een functionele sectie waarmee de oppervlakte-laagstructuur van onder naar boven en van links naar rechts wordt ge-40 analyseerd door de cfg grammaticaregel toe te passen op de Engelse ge- 8702 35 9 S' * 90 gevens en alle structurele mogelijkheden op te sporen.35 The English data subjected to the morphological analysis are transferred, together with the words ice-cream information obtained from the analysis, to an analysis section I 9020. The analysis section I 9020 is in this embodiment a functional section with which the surface layer structure from bottom to top and from left to right is analyzed by applying the cfg grammar rule to the English data 8702 35 9 S '* 90 and detecting all structural possibilities.

De Engelse gegevens worden na de analyse In de analysesectie I 9020 tezamen met de analyse-informatie toegezonden aan de analysesectie II 9022. De sectie selecteert een oplossing uit het resultaat van de 5 analyse van de oppervlaktelaagstructuur in de analysesectie I door een syntax analyse toe te passen. Op deze wijze wordt een plausibele analy-sestructuur van de Engelse zin voorbereid teneinde de structuur ervan te vormen. Deze analyseregels worden eveneens opgeslagen in het analy-seregelbestand 9036.The English data will be sent to analysis section II 9022 together with analysis information after analysis In analysis section I 9020. The section selects a solution from the result of the analysis of the surface layer structure in analysis section I by adding a syntax analysis to suit. In this way, a plausible English sentence analysis structure is prepared to form its structure. These analysis lines are also stored in the analysis line file 9036.

10 Na onderwerping aan de analyse worden de Engelse gegevens als ana-lysestructuurgegevens toegevoerd aan de structuurtransformatiesectie 9024. In de structuurtransformatiesectie 9024 wordt de corresponderende analysestructuur van de Japanse zin voorbereid uit een analysestructuur die een tussenstructuur is tussen de Engelse zin en getransformeerd 15 naar een Japanse onderliggende structuur, van waaruit de Japanse zin gemakkelijk kan worden vertaald.10 After submission to the analysis, the English data is fed to the structure transformation section 9024 as analysis structure data. In the structure transformation section 9024, the corresponding analysis structure of the Japanese sentence is prepared from an analysis structure which is an intermediate structure between the English sentence and transformed into a Japanese sentence. underlying structure, from which the Japanese sentence can be easily translated.

De gegevens voor de analysestructuur waarmee de Japanse onderliggende structuur wordt aangegeven en die op deze wijze zijn getransformeerd worden toegevoegd aan een vertaling genererende sectie 9026 waar-20 in de vertaalde zin wordt gegenereerd. Dit is een functie waarmee een Japanse zin wordt gegenereerd uit de Japanse analysestructuur. Allereerst wordt een zinstructuur gegenereerd waarin de analysestructuur wordt gewijzigd door het veranderen van de volgorde zodanig dat overeenstemming wordt bereikt met de Japanse taal en vervolgens wordt een 25 morfeemgeneratie uitgevoerd voor het genereren van de vertaalde zin van boven naar beneden en van links naar rechts in de analysestructuur van de zin.The analysis tree data indicating the Japanese underlying structure and transformed in this manner is added to a translation generating section 9026 where-20 is generated in the translated sense. This is a function that generates a Japanese sentence from the Japanese analysis structure. First, a sentence structure is generated in which the analysis structure is changed by changing the order so that agreement is reached with the Japanese language, and then a morpheme generation is performed to generate the translated sentence from top to bottom and from left to right in the analysis structure of the sentence.

De gegevens van de op deze wijze gegenereerde Japanse zin, dat wil zeggen de vertalingsgegevens, worden toegezonden aan een naredigeersec-30 tie 9030. De naredigeersectie 9030 modificeert de vertalingsgegevens door een opzoekproces in de woordenlijst 9018 gebruikmakend van de informatie die werd gebruikt tijdens het vertaalproces om een meer natuurlijke Japanse zin te verkrijgen. De gegevens voor de Japanse zin worden overgedragen aan een uitgangssectie 9032 en vandaar als vertaal-35 de Japanse zin 9034 afgegeven via de uitgangssectie 9032. De uitgangssectie 9032 omvat bijvoorbeeld een afdrukeenheid, weergeefeenheid en/of een bestandsgeheugeninrichting zoals een magnetisch schijfgeheugen.The Japanese sentence data generated in this way, that is, the translation data, is sent to a post-digest section 9030. The post-digest section 9030 modifies the translation data by a glossary search process 9018 using the information used during the translation process. to get a more natural Japanese sense. The data for the Japanese sentence is transferred to an output section 9032 and hence, as a translation, the Japanese sentence 9034 is output via the output section 9032. The output section 9032 comprises, for example, a printing unit, display unit and / or a file memory device such as a magnetic disk memory.

Een stroomschema van de reeks van vertaalbewerkingen wordt bestuurd door een stuursectie 9038 die de gehele besturing van de inrich-40 ting voor zijn rekening neemt. Het woorden!ijstbestand 9018 bevat woor- 87.0 2 o o 9 91 X -Λ.A flow chart of the series of translation operations is controlled by a control section 9038 which takes care of the entire control of the device. The word list file 9018 contains word 87.0 2 o o 9 91 X -Λ.

r χ denlijstgegevens voor Engelse en Japanse woorden en het analyseregelbestand 9036 bevat regel gegevens voor de morfologische analyse en de syntax analyse in deze uitvoeringsvorm.List of entries for English and Japanese words and the analysis rule file 9036 contains line information for the morphological analysis and the syntax analysis in this embodiment.

De stuursectie 9038 is gekoppeld met een operatieweergeefsectie 5 9040. De operatieweergeefsectie 9040 omvat operatietoetsen voor het verschaffen van diverse instructies aan een operateur van de onderhavige inrichting, bijvoorbeeld een vertaalinstructietoets of cursortoets, en een weergeefscherm of indicatiemogelijkheid waarmee ingevoerde Engelse tekst, Japanse zinnen als resultaat van de vertaling, tussenliglo gende gegevens zoals woordenlijstinformatie en diverse instructies aan de operateur zichtbaar kunnen worden gemaakt., Het kan ook zodanig uitgevoerd zijn dat de weergeefschermfuncties worden verkregen met behulp van het toetsenbord indien dit geplaatst is bij de ingangssectie 9010 of met een weergeefscherm dat zich bevindt bij de uitgangssectie 9032. 15 Via de analysesectie I 9020 wordt de cfg grammaticaregel toegepast op de Engelse zin op een wijze van onder naar boven en van rechts naar links voor de Engelse gegevens na de morfologische analyse teneinde alle mogelijke oplossingen voor de zinstructuur op te sporen. De oplossingen worden in het algemeen begrepen in de vorm van een analysestruc-20 tuur. Deze toont een relatie voor woorden of groepen aanwezig in elk van de zinnen die met elkaar in verband staan in een ondergeschikte of samenhangende relatie zoals een modificerende relatie of een gevalsre-latie, bijvoorbeeld een aan elkaar ondergeschikte relatie zoals ouder, kind, kleinkind, enz. Elk van de woorden of groepen bevindt zich op het 25 knooppunt van de analysestructuur.The control section 9038 is coupled to an operation display section 5 9040. The operation display section 9040 includes operation keys for providing various instructions to an operator of the present device, for example, a translation instruction key or cursor key, and a display screen or indication option that displays entered English text, Japanese sentences. of the translation, intermediate data such as glossary information and various instructions can be made visible to the operator., It may also be designed so that the display screen functions are obtained using the keyboard if it is placed at the input section 9010 or with a display screen that is located at the exit section 9032. 15 Via the analysis section I 9020, the cfg grammar rule is applied to the English sentence in a way from bottom to top and from right to left for the English data after the morphological analysis in order to find all possible solutions. gene for the sentence structure. The solutions are generally understood in the form of an analysis structure. This shows a relationship for words or groups present in each of the sentences that are related in a subordinate or cohesive relationship such as a modifying relationship or a case relationship, for example a subordinate relationship such as parent, child, grandchild, etc. Each of the words or groups is located at the node of the analysis structure.

In deze uitvoeringsvorm wordt voorafgaand aan de syntax analyse onderscheid gemaakt naar vorm en vocabulair van een zin om collectieve rangschikkingen in de zinstructuur te beoordelen. In verband met de zinstructuur wordt als rangschikking verwezen naar "eenheid" en 30 "blok".In this embodiment, prior to the syntax analysis, a distinction is made according to the form and vocabulary of a sentence to assess collective rankings in the sentence structure. In connection with the sentence structure, reference is made to "unit" and "block".

De "eenheid" ia een groep van woorden die de minimumeenheid vormen voor het vertaalproces, welke eenheid identiek aan een woord in de ana-lyseprocedure wordt behandeld en waarbij de woordenlijstinformatie voor elk van de daarin aanwezige samenstellende elementen niet wordt ge-35 brui kt.The "unit" is a group of words that form the minimum unit for the translation process, which unit is treated identically to a word in the analysis procedure, and the glossary information for each of the constituent elements contained therein is not used.

Een "blok" is een structurele samenstelling waarbij de analyse bij voorkeur wordt uitgevoerd voor het interne gedeelte in plaats van voor het externe gedeelte, en welk op een equivalente wijze als een eenheid ten aanzien van het externe gedeelte ervan wordt behandeld. Het kan 40 hier bijvoorbeeld gaan om een uitdrukking, een groep, enz. alsmede eenA "block" is a structural composition in which the analysis is preferably performed for the internal part rather than the external part, and which is treated in an equivalent manner as a unit with respect to its external part. This could be, for example, an expression, a group, etc. as well as one

87 0 2 3 b S87 0 2 3 b S

92 T -f R £ deel corresponderend met tussenliggende symbolen die gebruikt worden in de cfg grammatica. Verder kan er sprake zijn van een geneste structuur, dat wil zeggen een blok kan in zichzelf een verder blok bevatten. Verder kan het concept van het blok ook voorzien zijn van een zin, para-5 graaf, meerdere zinnen die elk worden beschouwd als een blok. De verwerking geeft de voorkeur aan een gedeeltelijke analyse en wordt hierna aangeduid als "deelanalyse“. Daarmee kan een aantal overbodige van de bovenbeschreven structurele oplossingen worden uitgesloten en kan de efficiëntie van het analyseproces worden verbeterd resulterend in een 10 meer plausibel analyseresultaat.92 T -f R £ part corresponding to intermediate symbols used in the cfg grammar. Furthermore, there may be a nested structure, i.e. a block may contain a further block in itself. Furthermore, the concept of the block may also include a sentence, paragraph 5, multiple sentences, each of which is considered a block. Processing prefers partial analysis and is referred to hereinafter as "partial analysis", thereby eliminating some redundant from the above-described structural solutions and improving the efficiency of the analysis process resulting in a more plausible analysis result.

Voor een blok worden in deze uitvoeringsvorm twee typen gedefinieerd. Een ervan is een symbool in de cfg regel dat aangeduid wordt als "doel" in de onderhavige beschrijving en dat als resultaat van de analyse uitgevoerd op elk van de samenstellende elementen binnen het blok 15 moet worden geplaatst, dat wil zeggen een symbool waarmee de structuur of de eigenschappen van een blok worden beschreven. Het andere type is een symbool in de cfg regel aangeduid als "rol”, hetgeen wordt toege-voegd aan een blok bij het uitvoeren van de analyse van het uitwendige van een blok in een zin, groep of zinsdeel waarin het blok aanwezig is, 20 dat wil zeggen een symbool dat de relatie aangeeft tussen het blok en andere blokken.Two types are defined for a block in this embodiment. One is a symbol in the cfg line which is referred to as "target" in the present description and which must be placed within the block 15 as a result of the analysis performed on each of the constituent elements, i.e. a symbol representing the structure whether the properties of a block are described. The other type is a symbol in the cfg line referred to as "role", which is added to a block when performing the analysis of the exterior of a block in a phrase, group or phrase in which the block is present, 20 that is, a symbol indicating the relationship between the block and other blocks.

In het geval van de Engelse zin bijvoorbeeld "I said, "White house isn't white"", is het doel een zin en is de rol een zelfstandig naamwoord (een zinsdeel). Alhoewel doel en rol in de meeste gevallen iden-25 tiek zijn kunnen ze soms van elkaar verschillen zoals in dit voorbeeld.For example, in the English sentence "I said," White house isn't white "", the target is a phrase and the role is a noun (a phrase). Although purpose and role are identical in most cases, they can sometimes differ from each other as in this example.

Als in de uitvoeringsvorm getoond in figuur 64 de structurele configuratie van de Engelse zin als een blok wordt herkend dan worden de functionele secties voor het schatten van het doel en van de rol ervan 30 samengevat in de structuur als getoond in figuur 65. Uit deze figuur blijkt dat de structurele samenstelling van de Engelse zinsgegevens, voorgeredigeerd in de voorredigeersectie 9014, worden onderscheiden in de morfologische analysesectie 9016 gebruikmakend van de woordenlijst 9018 en het analyseregelbestand 9036.In the embodiment shown in Figure 64, if the English sentence structural configuration is recognized as a block, then the functional sections for estimating the target and its role are summarized in the structure as shown in Figure 65. From this figure it appears that the structural composition of the English phrase data, pre-edited in the pre-editing section 9014, is distinguished in the morphological analysis section 9016 using the glossary 9018 and the analysis rule file 9036.

35 De woordenlijst 9018 bevat woorden!ijstinformatie voor Engelse woorden en zinnen. Zoals bijvoorbeeld getoond is in figuur 68 zijn ingangsgegevens gevormd voor alle variaties van elk woord in deze uitvoeringsvorm en zijn alle mogelijke informaties ontwikkeld. Als zinsdeel-informatie bijvoorbeeld kan een groot aantal zinsdeel informaties worden 40 verschaft zoals in de figuur is getoond. Het zal duidelijk zijn dat de 8702355.35 The glossary 9018 contains words! List information for English words and phrases. For example, as shown in Figure 68, input data has been generated for all variations of each word in this embodiment and all possible information has been developed. For example, as phrase information, a large number of phrase information can be provided as shown in the figure. It will be clear that the 8702355.

93 1 Λ Τ > wijze waarop de woordenlijst 9018 is samengesteld niet tot dit ene uit-voeringsvoorbeeld is beperkt.93 1 Λ Τ> manner in which glossary 9018 has been compiled is not limited to this one embodiment.

Het analyseregelbestand 9036 bevat gegevens voor de bovenste positie waarmee het begin van een blok wordt aangegeven, de eindpositie die 5 het einde ervan aangeeft alsmede blokbewerkingsinformatie waarmee het blok wordt voorzien van een doel en een rol in de vorm van een tabel. Een voorbeeld daarvan is getoond in figuur 69. Een blok begint bijvoorbeeld bij ", samenvoegsel" en wordt beëindigd aan het einde van een zin. Een blok wordt derhalve gevormd beginnend bij het begin van de zin 10 tot aan direct voorafgaande aan het samenvoegsel en het doel ervan is een zinsdeel, terwijl de rol ervan een zin is. Verder wordt een ander blok gevormd vanaf het samenvoegsel tot aan het einde van de zin waarin zowel het doel als de rol gelijk zijn aan een zinsdeel.The analysis rule file 9036 contains data for the top position indicating the beginning of a block, the end position indicating the end of it, and block operation information that provides the block with a target and a table role. An example of this is shown in Figure 69. For example, a block starts at ", concatenation" and ends at the end of a sentence. A block is therefore formed starting from the beginning of the sentence 10 until immediately before the concatenation and its purpose is a phrase, while its role is a phrase. Furthermore, another block is formed from the concatenation to the end of the sentence in which both the purpose and the role are equal to a phrase.

Verder begint een blok bij "relatief voorzetsel" en eindigt bij 15 of aan het einde van de zin. In dit geval is het mogelijk een aantal eindtoestanden bij een begintoestand toe te laten. In het geval het blok wordt beëindigd door vormt een kluster vanaf direct voorafgaand aan het relatieve voorvoegsel tot aan de volgende een blok waarin het doel een zinsdeel en waarin de rol een bijwoord of bijvoeg-20 lijk naamwoord is. Dat betekent dat de kluster functioneert als bij-woordzin of adjectiefzin. Indien het einde gelijk valt met het einde van de zin vormt een kluster vanaf direct voorafgaand aan de relatieve voorzetsel tot aan het einde van de zin een blok, waarin het doel een zinsdeel en waarin de rol een bijwoord of adjectief is. Dit is in 25 overeenstemming met de voorwaarden voor het vormen van een groep, zinsdeel of zin verschijnend in gebruikelijke moderne Engelse zinnen. In de figuur wordt met het symbool " " een spatie aangegeven.Furthermore, a block starts at "relative preposition" and ends at 15 or at the end of the sentence. In this case it is possible to allow a number of end states at an initial state. In case the block is terminated by, a cluster from immediately preceding the relative prefix to the next forms a block in which the target is a phrase and in which the role is an adverb or adjective. This means that the cluster functions as an adverb or adjective sentence. If the end coincides with the end of the sentence, a cluster forms a block from immediately preceding the relative preposition to the end of the sentence, in which the target is a phrase and in which the role is an adverb or adjective. This is in accordance with the conditions for forming a group, phrase or phrase appearing in common modern English sentences. In the figure, a space is indicated by the "" symbol.

Opgemerkt wordt dat in de morfologische analysesectie 9016 de Engelse tekst, ingevoerd vanaf de voorredigeersectie 9014 allereerst 30 wordt verdeeld in zinnen die als vertaal eenheden worden beschouwd. In dit geval worden foutieve spellingen of niet geregistreerde woorden gedetecteerd. De woordenlijst 9018 wordt aangesproken voor elk van de zinseenheden en de woordenlijstinformatie voor elk van de samenstellende delen wordt opgehaald. Diverse samenstellingsmodussen worden toege-35 past in overeenstemming met deze woordenlijstinformaties.It is noted that in the morphological analysis section 9016, the English text, entered from the pre-editing section 9014, is first of all divided into sentences that are considered translation units. In this case, misspellings or unregistered words are detected. The glossary 9018 is addressed for each of the phrase units and the glossary information for each of the constituent parts is retrieved. Various composition modes are used in accordance with these glossary information.

Figuur 66 toont een stroomschema voor de collectieve samenstelling van een blok, uitgevoerd in de morfologische analysesectie 9016. Allereerst wordt een aanwijzer die de uitleespositie voor een Engelse zin aangeeft, ingesteld op het begin (9100). De beginpositie valt niet di-40 reet samen met het woord aan het begin maar valt samen met het (denk-Figure 66 shows a flowchart for the collective composition of a block performed in the morphological analysis section 9016. First, a pointer indicating the readout position for an English sentence is set to the beginning (9100). The starting position does not coincide with the word at the beginning but coincides with the (thinking-

870 2 jjS870 2 yyS

i Si S

94 beeldige) einde van de direct eraan voorafgaande zin. De woorduitsnij-dingsbewerking 9101 wordt uitgevoerd op die positie. Zoals getoond is in figuur 67 wordt bij een woorduitsnijdingsbewerking 9101 een woord uitgesneden door de positie te laten voortstappen over telkens een 5 plaats (9111) behalve wanneer het einde van de zin is bereikt (9110) en de woordenlijst 9018 wordt op dit woord aangesproken (9112) teneinde de woordinformatie te verkrijgen (9113).94 figurative) end of the immediately preceding sentence. The word cut operation 9101 is performed at that position. As shown in Fig. 67, in a word cut operation 9101, a word is cut out by incrementing the position one place each (9111) except when the end of the sentence is reached (9110) and the glossary 9018 is addressed on this word ( 9112) in order to obtain the word information (9113).

Als op deze wijze de woordinformatie is verkregen in de woorduitsnijdingsbewerking 9101 dan wordt de tabel 9036 voor de begin- en eind-10 toestanden van het blok aangesproken om te beoordelen of er enige overeenkomst bestaat met de beginpositie of niet (9102). Op deze wijze worden de stappen 9101 en 9102 herhaald totdat de woordovereenstemming met de beginpositie is gedetecteerd.In this manner, if the word information is obtained in the word cut operation 9101, then the table 9036 for the start and end 10 states of the block is addressed to judge whether there is any correspondence with the start position or not (9102). In this manner, steps 9101 and 9102 are repeated until the word correspondence with the home position is detected.

Als overeenstemming met de beginpositie is bereikt dan worden het 15 volgende woord en de daarop volgende woorden met het vereiste aantal opgehaald en aangesproken op overeenstemming met de beginpositie van het blok (9104). In dit geval wordt indien nodig voor elk van de woorden de woordenlijst aangesproken. De positie-aanwijzer wordt niet voorwaarts bewogen.If agreement with the starting position is reached, then the next word and the following words with the required number are retrieved and addressed according to the starting position of the block (9104). In this case, if necessary, the glossary is addressed for each of the words. The position indicator is not moved forward.

20 Als er overeenstemming is bereikt met de beginpositie van het blok in stap 9104 dan wordt een woord, dat overeenstemt met de blokeindposi-tie, ongeacht de beginpositie, aangesproken (9105). De stappen 9104-9106 worden herhaald totdat een woord dat overeenstemt met de eindpositie is gevonden. Als een woord overeenstemt met de eindvoor-25 waarde (9106) dan wordt een kluster met inbegrip van het woord herkend als een blok en het blok wordt ingeschreven (9107). Meer in het bijzonder wordt een blok voorbereid op de beoordeling dat de blokbewerkings-voorwaarde is vervuld in de positie waarin voor het eerst aan de eind-voorwaarde wordt voldaan. Vervolgens wordt met verwijzing naar de blok-30 bewerkingsinformatietabel 9036 de positie voor het woord, aangewezen door de aanwijzer op de positie waar het voortstappen daarvan was gestopt in de bewerking 91Q3, gedefinieerd als de beginpositie voor het blok, en de positie van het woord dat voldoet aan de eindvoorwaarde die het eerst daarna verschijnt wordt gedefinieerd als de eindpositie van 35 het blok. Tegelijkertijd wordt dan ook doel en rol van het blok ingeschreven.If agreement is reached with the start position of the block in step 9104, a word corresponding to the block end position, regardless of the start position, is addressed (9105). Steps 9104-9106 are repeated until a word corresponding to the end position is found. If a word matches the end condition (9106) then a cluster including the word is recognized as a block and the block is written (9107). More specifically, a block is prepared for judging that the block operation condition has been fulfilled in the position where the end condition is first met. Next, with reference to the block-30 edit information table 9036, the position for the word, indicated by the pointer at the position where its progress was stopped in the operation 91Q3, is defined as the start position for the block, and the position of the word satisfies the end condition that first appears next is defined as the end position of the block. At the same time, the purpose and role of the block is also registered.

Als resultaat van een dergelijke blokherkenning wordt, indien er optreedt "..., samenvoegsel..." in een Engelse zin bijvoorbeeld zoals getoond is in figuur 70, de kluster vanaf het begin van de zin tot aan 40 het gedeelte voorafgaand aan herkend als een blok, terwijl de klus-As a result of such block recognition, if "..., concatenation ..." occurs in an English sentence, for example, as shown in Figure 70, the cluster from the beginning of the sentence to 40 the portion before is recognized as one block, while the job-

8 / 0 2 3 j S8/0 2 3 y S

X 1»X 1 »

I VIV

95 ter vanaf ", samenvoegsel" tot aan het einde van de zin wordt herkend als een ander blok. In de figuur wordt met ( ) het inwendige van een blok aangegeven. In dit blok zijn zowel het doel als de rol een zin. Verder vormt de kluster vanaf het woord na het samenvoegsel tot aan het 5 einde van de zin een ander blok, waarin in beide gevallen het doel en de rol ook een zin zijn.95b from ", conjunction" to the end of the sentence is recognized as another block. In the figure, () indicates the interior of a block. In this block, both the purpose and the role are a sentence. Furthermore, from the word after the concatenation to the end of the sentence, the cluster forms another block, in which in both cases the purpose and the role are also a sentence.

Als alternatief kan de kluster vanaf het samenvoegsel tot aan het einde van de zin als een blok zijn gedefinieerd. In dat geval is het doel een zin en is de rol een bijwoord.Alternatively, the cluster can be defined as a block from the join to the end of the sentence. In that case, the goal is a sentence and the role is an adverb.

10 Het blok kan ook worden gedefinieerd beginnend bij de positie die niet voorzien is van ",". Verder kan een punctuatie of dergelijke worden uitgesloten van de analysedoelstel ling de in het blok aanwezige informatie te vinden.The block can also be defined starting from the position that does not have ",". Furthermore, a punctuation or the like can be excluded from the analysis objective to find the information contained in the block.

Op dezelfde wijze kan "... relatief voorzetsel ...", ", relatief 15 voorzetsel ..." worden herkend als een blok. In dit blok is het doel een zinsdeel of zin en is de rol een bijwoord of adjectief.Likewise, "... relative preposition ...", ", relative preposition ..." can be recognized as a block. In this block, the goal is a phrase or phrase and the role is an adverb or adjective.

Het blok kan natuurlijk aanwezig zijn in een geneste structuur.The block can of course be present in a nested structure.

Als de Engelse zin bijvoorbeeld een structuur heeft "(begin van de zin)..., samenvoegseT, ..., relatief voorzetsel... (eind van de zin)" 20 zoals bijvoorbeeld is getoond in figuur 71 dan vormt de kluster vanaf ", samenvoegsel" tot aan het einde van de zin een blok BL1-BL1, waarin ", relatief voorzetsel...," aanwezig is als een ander blok BL2-BL2.For example, if the English sentence has a structure "(beginning of sentence) ..., concatenation, ..., relative preposition ... (end of sentence)" 20 as shown in figure 71, for example, the cluster starts from ", concatenation" to the end of the sentence is a block BL1-BL1, in which ", relative preposition ...," is present as another block BL2-BL2.

Op deze wijze wordt in de morfologische analysesectie 9016 onderscheid gemaakt tussen het kenmerk van de zin ten aanzien van de vorm en 25 vocabulair orn de structurele samenstelling als blok te kunnen onderscheiden. Naast een dergelijke blokherkenning voert de morfologische analysesectie 9016 ook diverse klassificaties uit zoals uitdrukkingen voor eigennaam, afgeleiden, onbekende woorden, afgekorte woorden, numerieke uitdrukkingen, tijdsuitdrukkingen, afgekorte woorden, apostroffen 30 ('), alsmede zoeken naar gezegdes en vaste uitdrukkingen om de morfologische analysegegevens voor te bereiden.In this way, in the morphological analysis section 9016, a distinction is made between the characteristic of the sentence with respect to the form and the vocabulary to be able to distinguish the structural composition as a block. In addition to such block recognition, the morphological analysis section 9016 also performs various classifications such as expressions for proper nouns, derivatives, unknown words, abbreviated words, numerical expressions, time expressions, abbreviated words, apostrophes 30 ('), as well as searching for words and fixed expressions around the morphological prepare analysis data.

De aan de morfologische ananlyse op deze wijze onderworpen Engelse zin wordt tezamen met de analyse-informatie overgedragen naar de analysesectie I 9020. Figuur 72 toont een voorbeeld van de uitgangsgegevens. 35 De figuur toont het resultaat van de Engelse zin "I said, "White house is'nt white"" is ingevoerd vanaf de ingangssectie en ontleed in de morfologische analysesectie 9016. Het blok 1 begint bij de woordpositie 4 en eindigt bij de positie 10, waarbij zowel het doel als de rol in dit geval optioneel zijn. Op dezelfde wijze begint blok 2 op de positie 5 40 en eindigt op de positie 6, waarin het doel een groep van zelfstandige 8702 ï; i 96 ί * i > naamwoorden is terwijl de rol een eigen naam is» Dat wil zeggen het blok “White house is'nt white" bevat in zichzelf een ander blok namelijk "White house" als nest. In het ene blok, dat wil zeggen het kleinere blok "White house" functioneert elk van de inwendige samenstellen-5 de componenten als een eigennaam, terwijl het de positie van zelfstandig naamwoordelijk zinsdeel inneemt ten opzichte van zijn omgeving dat wil zeggen “is’nt white". "White house" kan als eenheid worden behandeld.The English sentence subjected to the morphological analysis in this manner is transferred together with the analysis information to the analysis section I 9020. Figure 72 shows an example of the starting data. 35 The figure shows the result of the English sentence "I said," White house is'nt white "" is entered from the input section and decomposed into the morphological analysis section 9016. Block 1 starts at the word position 4 and ends at position 10 , with both the target and the role being optional in this case. Likewise, block 2 starts at position 40 and ends at position 6, in which the target is a group of self-contained 8702; i 96 ί * i> nouns while the role is its own name »ie the block" White house is'nt white "contains in itself another block namely" White house "as nest. In one block, that is say the smaller block "White house" each of the internal assemblies function as a proper noun, while taking the position of noun phrase relative to its environment ie "is'nt white". White house can be treated as a unit.

Tezamen met dergelijke blokinformatie wordt woordinformatie opge-10 zocht uit de woordenlijst 9018 en toegevoegd en toegezonden vanaf de analysesectie 9016 naar de analysesectie I 9020.Along with such block information, word information is retrieved from the glossary 9018 and added and sent from the analysis section 9016 to the analysis section I 9020.

De morfologische analysesectie I 9020 zorgt voor een analyse van de oppervlaktelaagstructuur van de Engelse zin door het toepassen van een contextvrije grammaticaregel, opgeslagen in het analyseregelbestand 15 9036 teneinde alle mogelijke analysestructuren te vinden. Als in dit geval een blok is ingesloten dan wordt de bovenbeschreven gedeeltelijke analyse uitgevoerd waarbij de voorkeur wordt gegeven aan de lokale analyse. Dit kan de werkingsgraad en de nauwkeurigheid van de analyse verbeteren. Meer in het bijzonder wordt de blokinsluitingsrelatie gevonden 20 uit de positionele informatie voor het blok. Het binnenste blok wordt derhalve geanalyseerd. Het met de analyse voltooide blok wordt beschouwd als eenheid en het inwendige ervan wordt niet verder bewerkt.The morphological analysis section I 9020 provides an analysis of the surface layer structure of the English sentence by applying a context-free grammar rule, stored in the analysis rule file 15 9036 to find all possible analysis structures. In this case, if a block is embedded, then the partial analysis described above is performed with preference being given to the local analysis. This can improve the effectiveness and accuracy of the analysis. More specifically, the block inclusion relationship is found from the positional information for the block. The inner block is therefore analyzed. The block completed with the analysis is considered to be unit and its interior is not further processed.

Op deze wijze wordt het analysegebied geleidelijk aan uitgebreid tot aan de buitenste blokken. Tenslotte is de gehele zin geanalyseerd. De 25 analyse wordt uitgevoerd op basis van een cfg grammaticale regel op een wijze van onder naar boven en van links naar rechts in de Engelse zin.In this way, the analysis area is gradually extended to the outer blocks. Finally, the entire sentence has been analyzed. The analysis is performed on the basis of a cfg grammatical rule in a way from bottom to top and from left to right in the English sense.

De analyse wordt uitgevoerd op deze wijze omdat daardoor alle grammaticaal toegestane mogelijkheden worden gehandhaafd.The analysis is performed in this way because it maintains all grammatically allowed possibilities.

Figuur 73 toont een voorbeeld van een dergelijk analysebewerkings-30 stroomschema. Allereerst worden gebaseerd op de Engelse gegevens, geleverd door de analysesectie I 9020 alle structurele samenstellingen voor een zin herkend als blokken en wordt het doel en de rol daarvan geschat (9120). De wijze van samenvoegen is geïllustreerd in figuur 70. Als er daarna geen blok aanwezig is in een dergelijke samenstelling (9121) dan 35 wordt de zin geanalyseerd (9125), en alleen een zin die collectief als symbool is samengesteld wordt geselecteerd en de analyse van de zin wordt beëindigd (9126). Omdat de bewerkingen 9125 en 9126 zijn opgenomen in de bewerkingen 9121-9124 indien het verwerkingsstelsel voor het behandelen van de gehele zin als een blok wordt gebruikt zijn ze der-40 halve niet noodzakelijk.Figure 73 shows an example of such an analysis processing flow chart. First of all, based on the English data, provided by the analysis section I 9020, all structural clauses for a sentence are recognized as blocks and the purpose and role thereof are estimated (9120). The method of joining is illustrated in figure 70. If after that no block is present in such a composition (9121), then the sentence is analyzed (9125), and only a sentence that is collectively composed as a symbol is selected and the analysis of the sentence is ended (9126). Since operations 9125 and 9126 are included in operations 9121-9124 if the processing system is used as a block for treating the entire sentence, they are therefore unnecessary.

87Q2o: Q87Q2o: Q

97 α -ι I l97 α -ι I l

Als een blok aanwezig is wordt allereerst het binnenste blok geanalyseerd (9122). In het voorbeeld getoond in figuur 71 wordt het inwendige van het blok BL2-BL2 geanalyseerd. Alhoewel diverse oplossingen over het algemeen worden verkregen tijdens de analyse wordt uit deze 5 oplossingen die oplossing geselecteerd waarin het blok collectief is gerangschikt als een cfg symbool en die voldoet aan het doel van het blok. In dit geval worden al die blokken met een symbool en als optioneel doel het blok geselecteerd. Daarna worden de op deze wijze geselecteerde blokken behandeld als een enkele samenstelling met als rol 10 het blok (9124). In een blok met een optionele rol wordt de rol van het symbool dat verzameld is in de verwerking 9123 gedefinieerd als de rol. De bewerkingen 9121-9124 worden achtereenvolgens herhaald.If a block is present, the inner block is first analyzed (9122). In the example shown in Figure 71, the interior of the block BL2-BL2 is analyzed. Although various solutions are generally obtained during the analysis, out of these 5 solutions is selected that solution in which the block is collectively arranged as a cfg symbol and which satisfies the purpose of the block. In this case, all those blocks with a symbol and as an optional target the block are selected. Thereafter, the blocks selected in this way are treated as a single composition with the block (9124) as the roll. In a block with an optional role, the role of the symbol collected in processing 9123 is defined as the role. Operations 9121-9124 are repeated consecutively.

Op deze wijze wordt in het voorbeeld van figuur 71 eerst het inwendige van het blok BL2-BL2 geanalyseerd en daarna wordt het inwendige 15 van het BL1-BL1 geanatyseerd. In dit geval wordt het blok BL2-BL2 op gelijke wijze behandeld als een enkel woord en elk van de samenstellende delen ervan wordt niet geanaliseerd.In this way, in the example of Figure 71, the interior of the block BL2-BL2 is first analyzed and then the interior of the BL1-BL1 is analyzed. In this case, the block BL2-BL2 is treated in the same way as a single word, and each of its constituent parts is not analyzed.

Als op deze wijze de gegevens die de structurele configuratie en de onderlinge afhankelijke relaties definiëren zijn verkregen dan wor-20 den ze toegezonden aan de analysesectie II 9022. De gegevens kunnen gemakkelijk worden herkend in de vorm van een analysestructuur op de bovenbeschreven wijze. De gegevens worden verder getransformeerd in de structuur van de Japanse zin in de structuurtransformatiesectie 9024, en in de vertaling genererende sectie 9026 wordt de vertaalde zin gege-25 nereerd voor elk van de daarin aanwezige knooppunten. De knooppuntbe-werking in de analysestructuur wordt uitgevoerd op een wijze van boven naar beneden en van links naar rechts.When the data defining the structural configuration and the interdependent relationships are obtained in this way, they are sent to the analysis section II 9022. The data can be easily recognized in the form of an analysis structure in the manner described above. The data is further transformed into the Japanese sentence structure in the structure transform section 9024, and in the translation generating section 9026, the translated sentence is generated for each of the nodes contained therein. The node operation in the analysis structure is performed from top to bottom and from left to right.

De op deze wijze gegenereerde vertaalde zin wordt onderworpen aan een nabewerking in de nabewerkingssectie 9030, visueel weergegeven in 30 de operatieweergeefsectie 9040 en bijvoorbeeld afgedrukt als een Japanse zin 9034 in de uitgangssectie 9032.The translated sentence generated in this way is subjected to a post-processing in the post-processing section 9030, visualized in the operation display section 9040 and printed, for example, as a Japanese sentence 9034 in the output section 9032.

Op deze wijze wordt in overeenstemming met dit uitvoeringsvoor-beeld het kenmerk van de Engelse zin ten aanzien van de vorm en de vo-cabulair onderscheiden om de structurele samenstelling als blok te dis-35 crimineren. Voor het blok worden het doel, hetgeen het analyseresultaat kan zijn, en de structurele rol, waarmee het blok functioneert ten opzichte van de buitenwereld, geschat. Daarna wordt de oppervlaktelaag-structuur van de Engelse zin geanalyseerd door het toepassen van een contextvrije grammaticale regel om alle mogelijke analysestructuren op 40 te sporen. Dit maakt het mogelijk om het aantal onbruikbare oplossingen 8702359 * 3 98 te verminderen en de analysewerkingsgraad te verbeteren alsmede een meer betrouwbaar analyseresultaat te verkrijgen.In this manner, in accordance with this exemplary embodiment, the English sense of shape and vo-cabular is distinguished to discriminate the structural composition as a block. For the block, the purpose, which may be the analysis result, and the structural role with which the block functions in relation to the outside world are estimated. The surface layer structure of the English sentence is then analyzed by applying a context-free grammatical rule to detect all possible analysis structures at 40. This makes it possible to reduce the number of useless solutions 8702359 * 3 98 and to improve the analysis efficiency as well as to obtain a more reliable analysis result.

Omdat er diverse patronen zijn voor bijvoeglijke uitdrukkingen is het moeilijk om deze tijdens de ontleding te herkennen, in het bijzon-5 der tijdens een contrextvrije type-analyse. Omdat het met verwijzing naar het bovenstaande over het algemeen moeilijk is om de herkenning van bijvoeglijke uitdrukkingen uit te voeren na de ontleding is een dubbelzinnige vertaling onvermijdelijk. Als verder een regel zou worden opgesteld waarmee herkenning wel mogelijk was, dan zou het risico be-10 staan dat geen bijvoeglijke uitdrukking wordt herkend als een identiek geval of er ontstaat een zeer groot aantal mogelijke combinaties. Dat wil zeggen er wordt een overbodige lokale analyse uitgevoerd tussen de delen aanwezig in de bijvoeglijke uitdrukking en in andere delen.Since there are various patterns for adjective expressions, it is difficult to recognize them during the parsing, especially during a contrast-free type analysis. Since, with reference to the above, it is generally difficult to perform adjective recognition after the parsing, an ambiguous translation is inevitable. Furthermore, if a rule were drawn up that would allow recognition, then there would be a risk that no adjective would be recognized as an identical case or a very large number of possible combinations would arise. That is, an unnecessary local analysis is performed between the parts contained in the adjective and other parts.

Rekening houdend met het bovenstaande wordt in de onderhavige uit-15 vinding de last op de bewerking in de analysestap verzacht door de bijvoeglijke uitdrukking te herkennen via het kenmerk van de zin rekening houdend met de vorm of de semantische aard van de woorden. Een schatting van de bijvoeglijke uitdrukking wordt uitgevoerd door het volgende patroon als een blok te herkennen.Taking into account the above, in the present invention, the burden on the processing in the analysis step is mitigated by recognizing the adjective through the characteristic of the sentence taking into account the form or semantic nature of the words. An estimate of the adjective is performed by recognizing the following pattern as a block.

20 Voor de Engelse zinstructuur "*u, relatief voornaamwoord wordt het relatieve voornaamwoord herkend door de zinsdeel code van het woord te voorzien van een bepaalde code, bijvoorbeeld "R". In dit geval wordt het inwendige omgeven door V herkend als een blok op voorwaarde dat dit niet een in de voorredigering geïndiceerd blok of eenheid door-25 kruist en dat het niet een "and" of "or" bevat in het gedeelte na de tweede Voor de Engelse zinsstructuur betrekkelijk voornaamwoord .", wordt het inwendige omgeven door V' en V beschouwd als een blok. De punt kan ieder ander symbool zijn dat kan worden gebruikt voor het einde van een zin.20 For the English sentence structure "* u, relative pronoun, the relative pronoun is recognized by providing the phrase code of the word with a certain code, for example" R ". In this case, the interior surrounded by V is recognized as a block provided that this does not intersect a block or unit indicated in the rationale and that it does not contain an "and" or "or" in the part after the second pronoun relative to the English sentence structure. ", the interior is surrounded by V ' and V considered as a block. The period can be any other symbol that can be used for the end of a sentence.

30 Voor het uitvoeren van een dergelijke schatting van een bijvoeglijke uitdrukking wordt de woordenlijst 9018 zodanig geconstrueerd dat de betekenisinformatie van de woorden daarin wordt opgenomen. De betekenisinformatie illustreert het onderscheid tussen artikelen, plaats, persoon, enz. zoals getoond is in figuur 74. Ook voor de blokbewer-35 kingsvoorwaarden is de tabel 9036 zodanig uitgevoerd als getoond is in figuur 75 dat het begin van het blok wordt herkend door "eigennaam (persoon), zelfstandig naamwoord (persoon)" als beginvoorwaarde en het begin van het blok wordt herkend door "eigennaam (persoon), artikel U zelfstandig naamwoord (persoon)". Het is dus mogelijk om de bijvoeglij-40 ke uitdrukking te schatten uitgaande van de morfeemeigenschappen en de 8702359 J_ i I 1 99 semantische eigenschappen zonder een analyse uit te voeren en de analyse uit te voeren in overeenstemming met de schatting van de bijvoeglijke uitdrukkingen in het voorbeeld dat getoond is in figuur 64 voor andere bewerkingen.To perform such an adjective expression estimate, the glossary 9018 is constructed to include the meaning information of the words. The meaning information illustrates the distinction between items, place, person, etc. as shown in figure 74. Also for the block operation conditions, table 9036 is arranged as shown in figure 75 so that the beginning of the block is recognized by " proper name (person), noun (person) "as the initial condition and the beginning of the block is recognized by" proper name (person), article U noun (person) ". Thus, it is possible to estimate the adjective expression from the morpheme properties and the 8702359 J_ i I 1 99 semantic properties without performing an analysis and performing the analysis in accordance with the estimate of the adjectives in the example shown in Figure 64 for other operations.

5 In de Engelse zin zijn er groepen die zeer bijzondere informatie dragen en die slechts op een zeer beperkte wijze worden toegepast. Als deze op dezelfde wijze worden geanaliseerd als de gebruikelijke groepen, dan worden ze in de zin ontleed op een geheel andere wijze en is het moeilijk om de originele aard van de zin terug te krijgen door de 10 analyse. Dit resulteert veelal in verlies.5 In the English sense, there are groups that carry very special information and are used only in a very limited way. If these are analyzed in the same way as the usual groups, then they are decomposed in the sentence in a completely different way and it is difficult to recover the original nature of the sentence by the analysis. This often results in loss.

De uitdrukkingen "let's" of "let us" direct na een punctuatie enz. worden geanalyseerd als gebiedende wijs met als causaal werkwoord "let" en zouden ontleed moeten worden als een groep met een uitnodigende aard "let". "Let" heeft verder diverse toepassingen als overgankelijk werk-15 woord en toepassingen als zelfstandig naamwoord en is niet beperkt tot het gebruik in invitaties zoals die bij een hulpwerkwoord. De ontleding moet derhalve worden uitgevoerd voor de respectievelijke mogelijkheden om de efficiëntie te reduceren. Verder is het moeilijk om aan het gebruik als invitatie de voorkeur te geven in het analyseresultaat omdat 20 er geen verschil is tussen het gebruik als causaal werkwoord en invita-tioneel werkwoord alleen in de zinstructuur zelf en het moeilijk is om daartussen onderscheid te maken als alleen de zinstructuur bekend is.The expressions "let's" or "let us" immediately after a punctuation etc. are analyzed as an imperative with the causal verb "let" and should be parsed as a group with an inviting nature "let". Furthermore, "Let" has various applications such as transitive work-word and applications as a noun and is not limited to use in invitations such as those with an auxiliary verb. Therefore, the decomposition must be performed for the respective possibilities to reduce the efficiency. Furthermore, it is difficult to give preference to use as an invitation in the analysis result because there is no difference between use as a causal verb and inviting verb only in the sentence structure itself and it is difficult to distinguish between them as only the sentence structure is known.

Het verspillende verlies tijdens de analyse kan worden verminderd door "let's" of "let us" direct na een interpunctie uit te sluiten van 25 de ontleding. Door verder deze termen te scheiden van het fundamentele woordgebruik, dat wil zeggen het causale woordgebruik, kan de semantische ontleding gemakkelijk worden uitgevoerd.The wasteful loss during the analysis can be reduced by excluding "let's" or "let us" immediately after punctuation from the decomposition. Furthermore, by separating these terms from the fundamental word usage, i.e. the causal word usage, the semantic analysis can be easily performed.

Als "please", "let's" of "let us" verschijnt aan het begin van een blok, dan wordt een vlag ingesteld voor de blokinformatie en voor elk 30 van deze gevallen wordt geen eenheidsinformatie afgegeven. De Engelse zin "let's go to school" wordt bijvoorbeeld verwerkt als < go to school> <gekoppeld met let's y .If "please", "let's" or "let us" appears at the beginning of a block, then a flag is set for the block information and no unit information is provided for each of these cases. For example, the English phrase "let's go to school" is processed as <go to school> <linked with let's y.

Om een dergelijke verwerking van "let" uit te voeren is een sectie 9200 die informatie omtrent let bevat aangebracht tussen de morfologi-35 sche analysesectie 9016 en de analysesectie I 9020 in een modificatie van deze uitvoeringsvorm. Figuur 77 illustreert de betreffende secties tezamen. In deze figuur zijn dezelfde elementen als die, getoond in figuur 64, aangeduid met dezelfde referent!ecijfers.To perform such processing of "let", a section 9200 containing let information is interposed between the morphological analysis section 9016 and the analysis section I 9020 in a modification of this embodiment. Figure 77 illustrates the respective sections together. In this figure, the same elements as those shown in Figure 64 are indicated by the same reference numerals.

Verder is de woordenlijst 9018 zodanig uitgevoerd dat daarin de 40 let-informatie voor het woord wordt opgeslagen. Zoals getoond is in fi- 8702359 i x 100 ? guur 78 verschaft de let-Informatie een "0" voor gewone woorden, een "1“ voor "let's" en "let us" en een "2“ voor "please".Furthermore, the glossary 9018 is configured to store the 40 let information for the word. As shown in fi 8702359 i x 100? Figure 78 provides the Let Information with a "0" for common words, a "1" for "let's" and "let us" and a "2" for "please".

De let-informatieverwerkingssectie 9200 heeft een functie voor het ontvangen van het resultaat van de morfolofische analyse tezamen met de 5 Engelse ingangszin vanaf de analysesectie 9016 en het toevoegen van de let-informatie als additionele informatie aan de woordinformatie tijdens de analyse zoals getoond is in figuur 79. In dit geval wordt een blok voor de zin geplaatst. In het in de figuur getoonde voorbeeld is het blok 0 gelijk aan (start:l, einde:10, doel: zin, rol: zin). Dat wil 10 zeggen het blok bevat in dit voorbeeld een zin naast een bijzin, groep, enz. In dit geval omvat het concept van het blok ook een paragraaf en de gehele zin, die elk als een blok kunnen worden beschouwd. Verder wordt "de stam van een overgankelijk werkwoord (voorzien van een 's)" beschreven als een zinsdeel bij "let's" voor de woordinformatie en is 15 de let-informatie gelijk aan "1".The let information processing section 9200 has a function of receiving the result of the morpholophic analysis together with the English input sentence from the analysis section 9016 and adding the let information as additional information to the word information during the analysis as shown in Fig. 79. In this case, a block is placed before the sentence. In the example shown in the figure, block 0 is equal to (start: 1, end: 10, target: sentence, role: sentence). That is, the block in this example contains a sentence in addition to a clause, group, etc. In this case, the concept of the block also includes a paragraph and the entire sentence, each of which can be considered a block. Furthermore, "the root of a transitive verb (provided with an 's)" is described as a phrase in "let's" for the word information, and the let information equals "1".

Zoals getoond is in figuur 81 wordt een bewerking uitgevoerd voor het voorbereiden van het blok van de zin 9300 voorafgaand aan het begin van de collectieve samenstelling van het blok bij de Engelse ingangszin. De daaropvolgende verwerking kan hetzelfde zijn als in het stroom-20 schema dat getoond is in figuur 66. In de Engelse zin "I said, "Let's us go to school"" wordt bijvoorbeeld het blok 0 (begin: begin van de zin, einde: eind van de zin, rol: zin, doel: zin) gevormd.As shown in Figure 81, an operation is performed to prepare the block of the sentence 9300 prior to the start of the block's collective assembly at the English input sentence. The subsequent processing may be the same as in the flow-20 scheme shown in Figure 66. For example, in the English sentence "I said," Let's us go to school "", the block becomes 0 (beginning: beginning of the sentence, ending : end of sentence, role: sentence, purpose: sentence) formed.

Zoals getoond is in figuur 81 wordt in de structurele syntax analysesectie I 9020, elk van de structurele samenstellingen als een blok 25 herkend gebaseerd op de Engelse gegevens die eraan worden geleverd en het doel en de rol ervan worden geschat (9120). Als het blok niet in de samenstelling aanwezig is dan wordt de ontleding beëindigd. Als er blokken aanwezig zijn in de ingangszin dan wordt het meest binnenste blok als eerste geanalyseerd (9122). Alhoewel in het algemeen tijdens 30 de ontleding diverse oplossingen worden verkregen wordt alleen de collectief als een cfg symbool gerangschikte oplossing daaruit geselecteerd (9123). De daaropvolgende bewerkingen zijn hetzelfde als getoond is in figuur 73.As shown in Figure 81, in the structural syntax analysis section I 9020, each of the structural compositions is recognized as a block 25 based on the English data provided to them and its purpose and role estimated (9120). If the block is not present in the composition, decomposition is terminated. If blocks are present in the input sentence, the innermost block is analyzed first (9122). Although various solutions are generally obtained during the decomposition, only the collectively arranged solution as a cfg symbol is selected therefrom (9123). The subsequent operations are the same as shown in Figure 73.

Een dergelijke let-informatieverwerking wordt uitgevoerd in de 35 let-informatieverwerkingssectie 9200 in overeenstemming met de bewer-kingsstroomschema's die aTs voorbeeld aangegeven zijn in de figuren 83A en 83B. Allereerst wordt een aanwijzer ingesteld bij het begin van het blok (9330) teneinde de woordpositie aan het begin van het blok te controleren (9331). Als de let-informatie gelijk is aan "0", dan wordt de 40 aanwijzer stapsgewijze verder bewogen (9339) om over te gaan naar het 8/02359 101 < * T t volgende woord.Such a let information processing is performed in the let information processing section 9200 in accordance with the processing flowcharts shown as an example in Figs. 83A and 83B. First, a pointer is set at the beginning of the block (9330) to check the word position at the beginning of the block (9331). If the let information equals "0", the 40 pointer is stepped further (9339) to pass to the next word.

Als de let-informatie niet gelijk is aan "0", wordt de voorafgaande woordenlijstverwijzingseenheid gecontroleerd (9322). Als dit geen interpunctie is of als de aanwijzer niet bij het begin staat dan wordt 5 de aanwijzer stapsgewijze verder bewogen (9339) om verder te gaan met het volgende woord.If the let information is not "0", the preceding glossary reference unit is checked (9322). If this is not punctuation or if the pointer is not at the beginning, then the pointer is moved step by step (9339) to move on to the next word.

Als de voorafgaande woordenlijstverwijzingseenheid bij de controle een punctuatie is, of als de aanwijzer het begin aangeeft, dan wordt het meest binnenste blok van de laag, dat het woord bevat, gemakeerd 10 (9333).If the preceding glossary reference unit in the check is a punctuation, or if the pointer indicates the beginning, then the innermost block of the layer containing the word is cleared (9333).

Als vervolgens de let-informatie gelijk is aan "1" (9334) hetgeen wil zeggen dat het gaat om "let's" of "let us" direct na de punctuatie, dan wordt de rol van het gemarkeerde blok herkend als "inviterende zin" (9336). Als de informatie gelijk is aan "2" hetgeen gelijk staat met 15 "please", dan wordt de rol van het gemarkeerde blok herkend als een "vragende zin" (9335). Daarna wordt het doel van het gemarkeerde blok herkend als een gebiedende wijs (9337) en wordt de woordinformatie, aangewezen door de aanwijzer, weggelaten (9338). Daarna wordt de aanwijzer stapsgewijze voorwaarts bewogen (9339) om over te gaan naar het 20 volgende woord. De verwerking wordt uitgevoerd tot aan het woord in de laatste positie (9340).If subsequently the let information equals "1" (9334), which means that it is "let's" or "let us" immediately after the punctuation, then the role of the marked block is recognized as "inviting sentence" ( 9336). If the information equals "2" which equals 15 "please", the role of the highlighted block is recognized as a "interrogative phrase" (9335). Thereafter, the target of the marked block is recognized as an imperative (9337) and the word information designated by the pointer is omitted (9338). Then the pointer is moved forward (9339) step by step to move to the next word. Processing is performed up to the word in the last position (9340).

Figuur 80 toont voorbeelden van het ontledingsresultaat waarin een dergelijke let-informatiebewerking is uitgevoerd bij het voorbeeld op de bovengenoemde ingangszin: "I said, "Let's go to school"". De let-in-25 formatieverwerkingssectie 9200 elimineert, als de let-informatie wordt toegevoegd aan de woordinformatie, de informatie voor het woord die betrekking heeft op de let-informatie uit de tabel en de blokinformatie wordt beschreven als "gebiedende wijs" voor het doel en "inviterende zin" voor de rol zoals getoond is in figuur 80.Fig. 80 shows examples of the decomposition result in which such a let information operation has been performed in the example to the above input sentence: "I said," Let's go to school "". The let-in-25 formation processing section 9200 eliminates, when the let information is added to the word information, the word information relating to the let information from the table and the block information is described as "imperative" for the target and "inviting phrase" for the role as shown in Figure 80.

30 Als de woordenlijst 9018 wordt aangesproken met alle woorden of een aantal woorden, verbonden met een koppelteken tijdens de behandeling van dergelijke woorden in de Engelse zin dan wordt, indien hun ingangsgegevens in de woordenlijst 9018 aanwezig zijn, de behandeling voortgezet. Als voor een woord met koppelteken dat niet in de woorden-35 lijst 9018 is geregistreerd het gehele gedeelte wordt behandeld als een onbekend woord, bijvoorbeeld als een adjectief, dan kan dit niet worden vertaald omdat de woordenlijstinformatie van elk van de woorden, verbonden via een koppelteken, niet kan worden gebruikt. Als verder het ingangsgegeven voor de informatie van elk van de samenstellende delen 40 van de woorden met koppelteken aanwezig is in de woordenlijst 9018 dan 8702359 102 -r i * * kunnen ze niet worden genegeerd. Als daarnaast de ontleding wordt uitgevoerd door dergelijke combinaties te ontbinden in elk van de samenstellende delen dan is de wijze van koppeling in de woorden zeer veelzijdig.If the vocabulary 9018 is addressed with all words or a number of words associated with a hyphen during the handling of such words in the English sense, if their entries are contained in the vocabulary 9018, the treatment is continued. For a hyphenated word that is not registered in the glossary 9018, the entire portion is treated as an unknown word, for example, as an adjective, this cannot be translated because the glossary information of each of the words, connected via a hyphen, cannot be used. Further, if the entry for the information of each of the hyphenated constituent parts 40 is contained in the glossary 9018 then 8702359 102 -r * * they cannot be ignored. In addition, if the decomposition is performed by decomposing such combinations in each of the constituent parts, then the method of coupling in the words is very versatile.

5 Voor het oplossen van het boven gesignaleerde probleem worden de door een koppelteken verbonden woorden ontleed als een adjectief in de zin, en wordt een ontleding uitgevoerd alleen voor het inwendige gedeelte van het door een koppelteken verbonden woord door gebruik te maken van de samenstellende delen van de door een koppelteken verbonden 10 woorden en het resultaat daarvan wordt gecombineerd. Dit maakt het mogelijk om de ontleding voor via een koppelteken verbonden woorden uit te voeren terwijl gebruik wordt gemaakt van de informatie van elk van de samenstellende delen ervan. Dat wil zeggen dat voor via een koppelteken verbonden woorden die niet geregistreerd zijn in de woordenlijst 15 9080 het gehele gedeelte op dezelfde wijze wordt behandeld als een adjectief. De via een koppelteken verbonden zelfstandige naamwoorden worden verwezen naar de woordenlijst en de ontleding wordt uitgevoerd in een gesloten vorm alleen voor het binnenste gedeelte in de via het koppelteken verbonden woorden.To solve the above-identified problem, the hyphenated words are parsed as an adjective in the sentence, and a parsing is performed only for the inner portion of the hyphenated word using the constituent parts of the 10 words connected by a hyphen and the result thereof is combined. This makes it possible to perform the hyphenated word parsing while using the information from each of its constituent parts. That is, for hyphenated words not registered in the glossary 909080, the entire portion is treated in the same manner as an adjective. The hyphenated nouns are referred to the dictionary and the parsing is performed in a closed form only for the inner part in the hyphenated words.

20 Dat wil zeggen, als een via een koppelteken verbonden woord niet in de woordenlijst 9018 is geregistreerd dan wordt de blokinformatie met betrekking tot het gehele gedeelte als een blok uitgezonden en de woordenlijstverwijzing wordt voor elk van de samenstellende delen ervan binnen het blok uitgevoerd teneinde de respectievelijke eenheidsinfor-25 maties op te halen zonder dat het koppelteken daarbij wordt meegenomen.That is, if a hyphenated word is not registered in the vocabulary 9018, the block information related to the entire portion is broadcast as a block and the vocabulary reference is output for each of its constituent parts within the block to retrieve respective unit information without including the hyphen.

Voor de niet geregistreerde woorden in de woorden!ijstreferentie wordt een eindschattingsverwerking als een onbekende woordverwerking uitgevoerd.For the unregistered words in the words! Ice reference, an end estimate processing is performed as an unknown word processing.

Een dergelijke koppeltekenverwerking kan worden uitgevoerd in het 30 voorbeeld van de structuur die getoond is in figuur 64. In dit geval wordt de positie van het woord in de zin niet uitgedrukt door het aan het woord toegevoerde getal, maar door het aantal karakters vanaf het begin van de zin, dat wil zeggen het karakternummer.Such hyphen processing can be performed in the example of the structure shown in Figure 64. In this case, the position of the word in the sentence is not expressed by the number applied to the word, but by the number of characters from the beginning of the sentence, that is, the character number.

Figuur 84 illustreert een voorbeeld van de verwerking van door een 35 koppelteken verbonden woorden uitgevoerd in de morfologische analyse-sectie 9016. Voor de ingevoerde Engelse zin die bijvoorbeeld luidt "The anti-war attitude is her open-door policy." wordt de positie-aanwijzer stapsgewijze voortbewogen om een woord op te halen (9135) en de woordenlijst aan te spreken (9353). In dit geval wordt het koppelteken niet 40 als begrenzer voor het woord gebruikt. Als er een ingangsgegeven aanwe- 8702358 103 * J- τ » zig is (9353) dan wordt de woordinformatie uitgelezen (9359). Dit wordt herhaald tot aan het einde van de zin.Figure 84 illustrates an example of the processing of hyphenated words performed in the morphological analysis section 9016. For the entered English phrase that reads, for example, "The anti-war attitude is her open-door policy." the position pointer is moved in steps to retrieve a word (9135) and access the word list (9353). In this case, the hyphen 40 is not used as the word limiter. If there is an entry 8702358 103 * J- τ »zig (9353), then the word information is read (9359). This is repeated until the end of the sentence.

Als er als resultaat van de woordenlijstzoekprocedure geen ingangsgegeven wordt gevonden (9352) en indien het niet gaat om een woord 5 dat een koppelteken bevat (9354) dan wordt de woordinformatie uitgelezen (9359) terwijl indien het gaat om een woord met een koppelteken het blok met het koppelteken wordt ingeschreven 9355. In het blok met het koppelteken wordt de beginpositie ook de beginpositie van de via het koppelteken verbonden woorden en de eindpositie wordt ook de eindposi-10 tie voor de via het koppelteken verbonden woorden. De doelstelling is optioneel en de rol is die van bijvoeglijk naamwoord/zelfstandig naamwoord. Daarna wordt het koppelteken verwijderd om elk van de samenstellende woorden (9356) afzonderlijk te verkrijgen en met de respectievelijke samenstellende woorden wordt de woordenlijst aangesproken (9357). 15 De als resultaat van de woordenlijstzoekprocedure (9358) gevonden informatie wordt ingeschreven. In het geval van afgifte van woordinformatie in de stappen 9359 en 9358 wordt ze verwerkt als zinsdeel = een woord dat niet in de woordenlijst is geregistreerd in het geval van een niet in de woordenlijst geregistreerd woord.If no entry is found as a result of the dictionary search (9352) and if it is not a word 5 containing a hyphen (9354) then the word information is read (9359) while if it is a word with a hyphen the block with the hyphen is inscribed 9355. In the hyphen block, the starting position also becomes the starting position of the words connected via the hyphen and the end position also becomes the ending position for the words connected via the hyphen. The objective is optional and the role is that of adjective / noun. Thereafter, the hyphen is removed to obtain each of the constituent words (9356) separately and the respective constituent words address the glossary (9357). 15 The information found as a result of the glossary search (9358) is recorded. In the case of output of word information in steps 9359 and 9358, it is processed as phrase = a word not registered in the word list in the case of a word not registered in the word list.

20 Figuur 85 toont voorbeelden van de blokinformatie en woordinformatie van het Engelse blok dat tezamen in het blok aanwezig is als resul taat van de verwerking van de Engelse ingangszin in dit voorbeeld. In dit voorbeeld zijn de via een koppelteken verbonden woorden "anti-war" geregistreerd in de woordenlijst 9018 en de woorden "open-door" zijn 25 niet in de woordenlijst geregistreerd. Derhalve wordt het ingangsgegeven voor de via een koppelteken gekoppelde woorden "anti-war" als informatie voor het woord vastgelegd. De via een koppelteken gekoppelde woorden "open-door" worden echter ontbonden als "open" en "door" en geschreven als informatie voor de woorden zelf en blok 1 (start: 30, ein-30 de: 38, doel: optioneel, rol: bijvoeglijk naamwoord/zelfstandig naamwoord) wordt als informatie voor het blok opgenomen.Figure 85 shows examples of the block information and word information of the English block that is present together in the block as a result of the processing of the English input sentence in this example. In this example, the hyphenated words "anti-war" are registered in the glossary 9018 and the words "open-door" are not registered in the glossary. Therefore, the input for the hyphenated words "anti-war" is recorded as information for the word. However, the hyphenated words "open-door" are resolved as "open" and "by" and written as information for the words themselves and block 1 (start: 30, ein-30 de: 38, target: optional, role : adjective / noun) is included as information for the block.

Alhoewel de vorm van de Engelse additieve ondervraging zeer beperkt is is de verwerking ervan zeer gecompliceerd in de gebruikelijke ontledingswerkwijze. Het is verder niet eenvoudig om het werkwoord vast 35 te stellen waarop de additieve afvraging betrekking heeft.Although the form of the English additive interrogation is very limited, its processing is very complicated in the conventional decomposition process. Furthermore, it is not easy to determine the verb to which the additive interrogation refers.

Na de herkenning van het feit dat er sprake is van additieve afvraging, gebaseerd op de eigenschappen van de zin ten aanzien van zijn vorm, wordt de zin rekening houdend met het bovenstaande, behandeld als een informatie die betrekking heeft op de structurele configuratie die 40 erbij behoort, waardoor het werkwoord dat centraal stond bij de addi- 8702309 104 J* '3 £ ï tieve ondervraging kan worden gespecificeerd. Dat wil zeggen, het gedeelte van de additieve ondervraging in de Engelse zin wordt gevonden als een structureel patroon en de ontleding wordt uitgevoerd terwijl het gedeelte van de additieve ondervraging wordt beschouwd als een lou-5 tere informatie met een bepaald type bijbehorende structurele rangschikking.After recognizing that there is an additive interrogation based on the properties of the sentence with respect to its shape, taking into account the above, the sentence is treated as an information related to the structural configuration associated with it. which allows the verb that was central to the additive interrogation to be specified. That is, the portion of the additive interrogation in the English sense is found as a structural pattern and the decomposition is performed while the portion of the additive interrogation is considered to be lesser information with a particular type of associated structural arrangement.

In de onderhavige uitvoeringsvorm wordt een eenheid of blok beschreven in de vorm van een symbool (een startpunt waarmee aangegeven wordt dat dit een eindpunt van een eenheid of blok is).In the present embodiment, a unit or block is described in the form of a symbol (a starting point indicating that it is an end point of a unit or block).

10 In de morfologische analyse wordt de ingangszinstekst omgevormd waarbij ook de herkenning van het blok wordt uitgevoerd. In de onderhavige uitvoeringsvorm wordt "een aanhalingsteken" aangeduid als "Q" en een haakje wordt aangeduid als "P". De volgende notaties worden gebruikt: voor "..." wordt geschreven/(Q1.../)1, 15 voor "...“ wordt geschreven/(Q".../)“, voor (...) wordt geschreven (/(P.../)}, voor ... wordt geschreven /(P.../) , voor ... wordt geschreven /(P.../) en voor [...] wordt geschreven [/(P.../)].10 In the morphological analysis, the input sentence text is transformed and the recognition of the block is also performed. In the present embodiment, "a quotation mark" is designated "Q" and a parenthesis is designated "P". The following formats are used: for "..." is written / (Q1 ... /) 1, 15 for "..." is written / (Q "... /)", for (...) is written (/(P.../)}, for ... is written /(P.../), for ... is written /(P.../) and for [...] is written written [/(P.../)].

20 De blokherkenning wordt op dezelfde wijze uitgevoerd.Block recognition is performed in the same way.

Het beginsymbool en het eindesymbool van een blok worden alleen toegepast in de context waarin het blok door deze symbolen wordt geopend of gesloten. Het gedeelte direct voorafgaand aan het beginsignaal en direct na het eindsignaal moet anders zijn dan de alfanumerieke sym-25 bol en. De bovengenoemde symbolen die er niet mee corresponderen worden als loutere symbolen behandeld. De blokken kunnen soms genest zijn, vooropgesteld dat ze elkaar niet kruisen.The opening symbol and the ending symbol of a block are only applied in the context in which the block is opened or closed by these symbols. The part immediately before the start signal and immediately after the end signal must be different from the alphanumeric symbol. The above symbols that do not correspond to it are treated as mere symbols. The blocks can sometimes be nested, provided they don't cross.

Als er bij het verwerken van de additieve afvragingen de volgende woordgroepen volgen op een tijdstip waarop de aanwijzer staat bij 30 dan wordt de kluster tot aan u?u als eenheid weggelaten en wordt een blokvlag ingesteld. Dat wil zeggen, de vorm van de additieve ondervragende zin bevat: ", (hulpwerkwoord) + (persoonlijk voornaamwoord) " ", (hulpwerkwoord) n‘t + (persoonlijk voornaamwoord “ 35 ", (hulpwerkwoord) + (persoonlijk voornaamwoord) + not ".If, when processing the additive queries, the following phrases follow at a time when the pointer is at 30, then the cluster up to u? U is omitted as a unit and a block flag is set. That is, the form of the additive interrogative sentence contains: ", (auxiliary verb) + (personal pronoun)" ", (auxiliary verb) n't + (personal pronoun" 35 ", (auxiliary verb) + (personal pronoun) + not ".

Verder behoren tot de soorten hulpwerkwoorden: am, is, are, was, were, do, does, did, have, has, had, will, shall, would, should, can, cannot, could, may, might, must, ought, won't, shan't, need, dare, used. De soorten persoonlijke voornaamwoorden of voornaamwoorden omvat-40 ten I, you, he, she, it, we they.Other types of auxiliary verbs include: am, is, are, was, were, do, does, did, have, has, had, will, shall, would, should, can, cannot, could, may, might, must, ought , won't, shan't, need, dare, used. The types of personal pronouns or pronouns include -40 ten I, you, he, she, it, we they.

8702 * * 1 » 1058702 * * 1 »105

Deze worden als informatie gebruikt voor de meest binnenste laag van het blok waartoe ze behoren. In de Engelse zin bijvoorbeeld; you said so, didn't you1? wordt het gehele gedeelte herkend als een blok gezien in de structurele samenstelling in [you said so,]<met additieve 5 afvraging>. Op soortgelijke wijze wordt in de Engelse zin: I said, "you said so didn't you?", de aangehaalde vraag "you said so didn't you?" herkend als een blok 1 ten aanzien van de structurele configuratie, en verder wordt het gehele gedeelte herkend als een blok 2 ten aanzien van de structurele configuratie. Dat wil zeggen [I said, [you 10 said so,] < gekoppeld aan de additieve ondervraging > .These are used as information for the innermost layer of the block to which they belong. In the English sense, for example; you said so, didn't you1? the whole part is recognized as a block seen in the structural composition in [you said so,] <with additive 5 interrogation>. Similarly, in the English sense: I said, "you said so didn't you?", The quoted question "you said so didn't you?" recognized as a block 1 with respect to the structural configuration, and further, the entire portion is recognized as a block 2 with respect to the structural configuration. That is, [I said, [you 10 said so,] <linked to the additive questioning>.

Het afgekorte woord zoals "didn't" wordt behandeld na de ontwikkeling van een volledig gespelde vorm in overeenstemming met een vooraf bepaalde tabel. Voor woorden met een groot aantal afgekorte vormen worden al deze vormen afgegeven.The abbreviated word such as "didn't" is treated after the development of a fully spelled form in accordance with a predetermined table. For words with a large number of abbreviated shapes, all these shapes are output.

15 Voor het uitvoeren van een dergelijke verwerking van een additieve ondervraging wordt een additieve ondervraging verwerkende sectie 9210 aangebracht tussen de morfologische analysesectie 9016 en de syntax analysesectie I 9020 in een andere modificatie van de onderhavige uitvoeringsvorm. Figuur 87 illustreert deze betreffende secties tezamen.To perform such an additive interrogation processing, an additive interrogation processing section 9210 is interposed between the morphological analysis section 9016 and the syntax analysis section I 9020 in another modification of the present embodiment. Figure 87 illustrates these respective sections together.

20 In deze figuur zijn elementen, identiek aan die getoond in figuur 64, aangeduid met dezelfde referentiecijfers.In this figure, elements identical to those shown in figure 64 are designated by the same reference numerals.

De additieve ondervraging verwerkende sectie 9210 ontvangt het resultaat van de morfologische analyse tezamen met de ingevoerde Engelse zin vanaf de analysesectie 9016 en, zoals getoond is in figuur 88, 25 stelt een blok samen voor de zin. In de in de figuur getoonde voorbeelden is het blok 0 gelijk aan (begin:1, einde: 12, doel: zin, rol: zin). In dit geval wordt het woord vertegenwoordigd door het nummer van het woord in deze gemodificeerde uitvoeringsvorm. In deze uitvoeringsvorm omvat het blok bijvoorbeeld een zin naast de bijzin en de groep. In dit 30 geval omvat het concept van het woord ook een paragraaf en de gehele zin, die elk als een blok kunnen worden beschouwd.The additive interrogation processing section 9210 receives the result of the morphological analysis along with the input English sentence from the analysis section 9016 and, as shown in Fig. 88, 25, composes a block for the sentence. In the examples shown in the figure, block 0 is equal to (start: 1, end: 12, target: sentence, role: sentence). In this case, the word is represented by the number of the word in this modified embodiment. For example, in this embodiment, the block includes a sentence next to the clause and the group. In this case, the concept of the word also includes a paragraph and the entire sentence, each of which can be considered a block.

De collectieve samenstelling van het blok voor de Engelse ingangs-zin met inbegrip van de additieve ondervraging kan dezelfde zijn als in het bovenbeschreven stroomschema van figuur 81. Dat wil zeggen, de ver-35 werking 9300 voor het voorbereiden van het blok van de zin wordt uitgevoerd voorafgaand aan het begin van de bewerking. In de Engelse zin bijvoorbeeld: I said, "it is good, isn't it ", wordt blok 0 (start: begin van de zin, einde: eind van de zin, rol: zin, doel: zin) gevormd.The collective composition of the English input sentence block including the additive interrogation may be the same as in the above flowchart of Figure 81. That is, the sentence block processing 9300 is prepared. performed before starting the operation. For example, in the English sentence: I said, "it is good, isn't it", block 0 (start: start of sentence, end: end of sentence, role: sentence, goal: sentence) is formed.

In de syntax analysesectie I 9020 wordt de analyse uitgevoerd volgens 40 hetzelfde stroomschema dat getoond is in figuur 92.In the syntax analysis section I 9020, the analysis is performed according to the same flow chart shown in Figure 92.

8702358 « ' w 1068702358 w 106

De analyse in de additieve afvragingsverwerkingssectie 9210 wordt verklaard met verwijzing naar de figuren 90A en 90B. Allereerst wordt een aanwijzer ingesteld op het woord aan het begin van de woordinforma-tie (9370). Indien het geen komma is dan gaat de aanwijzer stapsgewijze 5 verder (9384) en dit wordt herhaald tot aan het einde van de zin (9371). Daarna wordt gecontroleerd of het woord volgend op de komma een woord is dat behoort tot de α-groep of een woord dat behoort tot de 3-groep waarbij de aanwijzer op de positie blijft waar ze stond (9373, 9379). Het is in dit geval gedefinieerd dat die woorden, die een hulp-10 werkwoord bevatten of een werkwoord zijn in het zinsdeel en niet in de negatieve vorm voorkomen woorden zijn die behoren tot de α-groep, terwijl die woorden die de negatieve vorm van een hulpwerkwoord of de negatieve vorm van een werkwoord in het zinsdeel bevatten woorden zijn die behoren tot de β-groep. Als het woord tot geen van beide groepen 15 behoort dan gaat de aanwijzer stapsgewijze verder (9484) en de procedures worden herhaald tot aan het einde van de zin (9371).The analysis in the additive interrogation processing section 9210 is explained with reference to Figures 90A and 90B. First, a pointer is set on the word at the beginning of the word information (9370). If it is not a comma, the pointer continues in step 5 (9384) and this is repeated until the end of the sentence (9371). It is then checked whether the word following the comma is a word that belongs to the α group or a word that belongs to the 3 group, where the pointer remains where it stood (9373, 9379). It is defined in this case that those words, which contain an auxiliary verb or are a verb in the phrase and do not appear in the negative form, are words that belong to the α group, while those words that contain the negative form of a auxiliary verb or the negative form of a verb in the phrase contain words that belong to the β group. If the word does not belong to either group, then the pointer continues step by step (9484) and the procedures are repeated until the end of the sentence (9371).

In het geval het woord behoort tot de α groep wordt de aanwijzervoortbewegingsstap 9384 uitgevoerd indien het woord dat volgt op de woorden van de α groep geen voornaamwoord is. Als het een voornaamwoord 20 is dan wordt gecontroleerd of het volgende woord "not" is of niet (9375) en indien het niet "not" is, dan wordt onderzocht of het woord volgend op het voornaamwoord een vraagteken is of niet (9377). Als het geen vraagteken is dan wordt de aanwijzervoortbewegingsstap 9384 uitgevoerd. Als het een vraagteken is dan wordt het doel herschreven als 25 "ontkennende zin" en de rol wordt gewijzigd in “additieve vragende zin" voor het meest binnenste blok (9378), en de wordt uit de infor- matietabel van het woord verwijderd (9383). Het meest binnenste blok betekent dat blok dat voldoet aan de voorwaarde: startpositie^(positie voor ",") en ook voldoet aan de voorwaarde: eindpositie>(positie voor 30 "7") voor de blokpositie en met het minimum van (eindpositie - startpositie).In case the word belongs to the α group, the pointer advancing step 9384 is performed if the word following the words of the α group is not a pronoun. If it is a pronoun 20 then it is checked whether the next word is "not" or not (9375) and if it is not "not" it is examined whether the word following the pronoun is a question mark or not (9377). If it is not a question mark, then the pointer advance step 9384 is performed. If it is a question mark, the target is rewritten as "negative phrase" and the role is changed to "additive interrogative phrase" for the innermost block (9378), and the is removed from the word information table (9383 The innermost block means that block that meets the condition: start position ^ (position for ",") and also meets the condition: end position> (position for 30 "7") for the block position and with the minimum of ( end position - start position).

Als het woord volgend op het voornaamwoord gelijk is aan "not" in stap 9375 dan wordt gecontroleerd of het woord volgend op "not" een vraagteken is of niet (9376). Als het geen vraagteken is dan wordt de 35 aanwijzervoortbewegingsstap 9384 uitgevoerd. Als het een vraagteken is dan wordt de doelstelling voor het binnenste blok herschreven tot "bevestigende zin" terwijl de rol wordt gewijzigd in "additieve afvragende zin" (9382), en wordt uit de woordinformatietabel verwijderd (9383).If the word following the pronoun equals "not" in step 9375, it is checked whether the word following "not" is a question mark or not (9376). If it is not a question mark, then the pointer advance step 9384 is performed. If it is a question mark, the target for the inner block is rewritten to "affirmative phrase" while the role is changed to "additive interrogative phrase" (9382), and is removed from the word information table (9383).

40 Als in stap 9379 het woord volgend op de komma een woord is dat S/0 Z oo9 107 4 * r ι· behoort tot de β groep, dan wordt de aanwijzervoortbewegingsstap 9384 uitgevoerd indien het woord volgend op de β groep geen voornaamwoord is. Als het een voornaamwoord is, dan wordt het onderzocht om te bepalen of het daaropvolgende woord een vraagteken is (9381) of niet en de 5 aanwijzervoortbewegingsstap 9384 wordt uitgevoerd indien het geen vraagteken is. Als het een vraagteken is dan wordt het doel van het meest binnenste blok herschreven tot "bevestigende zin", terwijl de rol wordt gewijzigd in additief vraagteken" (9382), en wordt uit de woordinformatietabel verwijderd (9383). Daarna wordt de aanwijzer 10 stapsgewijze voorwaarts bewogen (9384) en de procedure wordt herhaald tot aan het einde van de zin.40 If, in step 9379, the word following the comma is a word that S / 0 Z oo9 107 4 * r ι · belongs to the β group, then the pointer advance step 9384 is performed if the word following the β group is not a pronoun. If it is a pronoun, then it is examined to determine if the subsequent word is a question mark (9381) or not and the pointer advance step 9384 is performed if it is not a question mark. If it is a question mark, the target of the innermost block is rewritten to "affirmative phrase", while the role is changed to additive question mark "(9382), and is removed from the word information table (9383). Then the pointer 10 is incremented moved forward (9384) and the procedure is repeated until the end of the sentence.

Figuur 88 toont bijvoorbeeld de informatie voor het blok en voor de woorden verkregen uit de morfologische analysesectie 9016 aan de ad-ditieve afvraagverwerkingssectie 9210 voor de Engelse zin: I said, "It 15 is good, isn't it " die in het bovenstaande werd beschreven. De blokin-formatie voor blok 1 is (begin: 4, einde: 12, doel: optioneel, rol: optioneel). Als dit blok wordt onderworpen aan een additieve afvraagbe-werking in de additieve afvraagbewerkingssectie 9210, dan wordt de informatie voor het blok 1 herschreven in (start: 4, einde: 12, doel: be-20 vestigende zin, rol: additieve vragende zin), en tegelijkertijd wordt de woordinformatie die betrekking heeft op de additieve afvraging 8-11 verwijderd.For example, Figure 88 shows the information for the block and for the words obtained from the morphological analysis section 9016 to the add-on interrogation processing section 9210 for the English sentence: I said, "It 15 is good, isn't it" which was described above described. The block formation for block 1 is (start: 4, end: 12, target: optional, role: optional). If this block is subjected to an additive polling operation in the additive polling processing section 9210, the information for the block 1 is rewritten in (start: 4, end: 12, target: confirmatory sentence, role: additive interrogative sentence) , and at the same time, the word information related to the additive interrogation 8-11 is removed.

07023580702358

Claims

If the separating indication is present in the words found, then the words ice reference unit with which the said distinguishing indication has been found is combined with a words ice reference unit which is present near the said words ice unit and with which another distinguishing indication is found, the numeric values represented by the two words list reference units are calculated together into a single numeric value and the words list reference units are converted into a single parsing unit.

A language analysis device, comprising: a word list memory in which word list data is stored including morpheme data for words, compound words and sentences, and a parser for performing a morphological analysis for an entered sentence with reference to the said phrase. words! ice memory, wherein said words ice memory contains data indicative of the degree of coupling between each of the words belonging to compound words or sentences and said parser refers to the word memory for the respective words contained in the entered sentence and, if a number of glossary data is found for a word that combines with other words, then the combination of words with a higher degree of coupling is selected by referring to the degree of coupling data.

2. Language analysis device as claimed in claim 1, wherein if the degree of coupling data indicates a high degree of coupling for a number of combinations between a word taken out of the word list and other words, the parser produces preferred data with a given preference for one of the said number of combinations according to a certain rule.

3. Language analyzer, comprising: 25 input means for inputting a character array in a predetermined language, a basic glossary memory used to search for the input character array and storing basic data, and a parser for analyzing of the input character array by searching the basic dictionary where said parser addresses the basic dictionary memory using the input character array and if thereby a portion of said character array is found, the basic dictionary memory is similarly addressed with other parts of said character array to thereby analyze the entire character array.

The language analysis device according to claim 3, wherein the basic vocabulary memory is provided with a fundamental unit 40 vocabulary memory containing data stored therein with which dimension 87 0. V le units are expressed and said parsing device accesses the fundamental unit glossary memory using said input character array to thereby determine by parsing whether said character array expresses a dimensional unit or not.

5. Language analysis device according to claim 4, wherein the parsing device accesses the fundamental unit of vocabulary memory using the character array and, if said character array consists exclusively of a combination of character arrays stored in the fundamental unit of vocabulary memory with which a unit is expressed then, as a result of the search procedure, the character array is judged as a unit expression.

6. Language analysis device according to any one of claims 3 to 5, wherein the parsing device comprises a pointer, which pointer is set at a character at the beginning of the input character array and, if a part of the character array is found when looking up in said basic glossary memory using said character array starting with the character with the pointer set, then the pointer is set to a character array following the portion of the recovered character array, and said basic dictionary memory is addressed again using the following character array, with the pointer now set.

The language analysis device according to any one of claims 3 to 6, wherein the input character array is searched in the usual word list memory and not in the basic word list memory.

8. Language analyzer comprising: a glossary memory in which glossary data is stored for each word list reference unit, and a parsing memory for dividing an input phrase into word list reference units and performing a morphological analysis of said words list reference units. wherein reference is made to the vocabulary memory, said vocabulary memory being provided with the vocabulary data with distinguishing indicia indicating whether the vocabulary reference units represent numbers and said parser referring to the vocabulary memory using the respective vocabulary reference units 40 contained in the entered sentence and if the aforementioned 8702359

The language analyzer of claim 8, wherein the parser combines said parser, if accompanied by a word reference unit that expresses a money value symbol or a dimensional unit, together with the numerical value into a single parser.

10. Language analysis device comprising: input means for inputting a character array in a predetermined language, a word list memory which is used for looking up the said input character array, search means for searching the said words list memory by means of the input character array, and type information providing means for providing type information in a character array not registered in the word list memory and in a character array whose type information is not recorded in said word list memory, but we! belongs to the input character arrays, the type information providing means providing a number of type information to said character array which does not have type information.

11. A language analyzer according to claim 10, wherein said analyzer further comprises: unit excision means for dividing the input character array into glossary search units, and parsing means for analyzing the character array provided by the unit excision means is divided and searched by means of search means, together with the preceding character array.

12. Language analysis device according to claim 11, wherein if type information is present in said character array or in the character array preceding it and no type information is present in the other character array, the parsing means comprises said character array without type information provided with the type information of the other 8702359 ί I Γ ΐ character array.

Language analyzer according to claim 11, wherein, if character information is present in each character array and in the preceding character array, the parsing means provides a common type information as well as type information to both the character array and the preceding character array.

A language analysis device according to any of claims 2 to 4, wherein the character array and the preceding character array, analyzed by said parsing means, are proper names.

15. Language analysis device according to claim 11, wherein if the character array starts with a capital letter and the preceding character array is at the end of a sentence, the parsing means convert the upper letter at the beginning of said character array to a lower case letter and then again searching the word list memory by means of the search means and, if no record is found in the word list memory during said search process, said character array is analyzed as an unregistered proper name.

16. Language analyzer comprising: a glossary memory in which glossary data is stored for all glossary reference units, and parsing means for dividing an input phrase into glossary reference units and performing a morphological analysis by searching the glossary memory using the glossary reference units wherein the glossary data in the glossary memory includes distinguishing information specifying the position of a glossary reference unit representing a proper name if a plurality of proper names can occur in a consecutive sequence of proper names, and searches the parsing means in the glossary reference units, respectively , present in the entered sentence and, if the distinguishing information is contained in the found glossary data, the glossary reference unit with which the distinctive information is found is combined with the glossary reference unit which is directly adjacent to said glossary reference unit and which has other than proper name meaning in a single parsing unit according to the position specified by the distinguishing information. 8702355 τ V * *

17. Language Analysis Device comprising: input means for inputting a character array in a predetermined language, a glossary memory used for looking up said character array input through the input means, and type information parsing means searching in the dictionary memory with parse using the input character array and the type information of said character array, the type information parsing means decomposing the type information of the character array taking into account the type information of the character arrays before and after said character array.

The language analysis device of claim 17, wherein the type information parsing means parses the type information for a number of character arrays by collective combination of said number of character arrays.

The language analysis device according to claim 17, wherein if type information is present for said character array or for the character array preceding this character array and the type information is not present in the other array, the type information is The character array which is not provided with type information provided with the type information of the other character array.

The language analysis device according to any of claims 17-19, wherein the character array parsed by said type information parsing means is a proper name.

21. Language analyzer comprising: a glossary memory in which glossary data for all glossary reference units are stored, and parsing means for dividing an input phrase into glossary reference units and performing a morphological analysis for said glossary reference units by searching said glossary memory wherein said parsing means distinguish that a series of glossary reference units having a specific semantic element constitutes a composite unit having a specific meaning associated with a given rule and converts a sequence of glossary reference units with said specific semantic element into a single parsing unit.

The language analyzer according to claim 21, wherein the vocabulary memory contains data for distinguishing 40 vocabulary reference units with a particular semantic element, said70 pars having a discrimination matching table to distinguish that a series of glossary reference units with said specific semantic element constitutes a composite unit expressing a specific meaning under certain rules, and searches the parsers in said word list memory using the respective words list reference units from the input sentence, and converting a set of glossary reference units with said specific semantic element into a single parsing unit in accordance with the discrimination agreement table if the glossary reference units are distinguished by said specific sem antic element.

23. Language analysis device in which the grammatical nature, the semantic nature or the translated word is estimated as being a word not registered in the glossary, which is recognized for its morpheme properties as a derivative such as a suffix.

24. Language analyzer, comprising: a first parser for performing a morphological analysis on an entered sentence in a predetermined language, a second parser for performing an analysis of the sentence in said language based on the result of the morphological analysis in the first named parser, glossary memory in which wordlist data of said language is used which is used for the analysis by said first and second parser means, and control means for searching the said dictionary and for executing the parsing procedure in the first and second parsing means, wherein said first parsing means searches in the vocabulary memories, distinguishes the structural arrangement by distinguishing the characteristics of the phrase entered in said language with respect to its form and estimating the attitude that may be the result of the o decomposition and of the structural role of said arrangement which functions in said sentence, and said second parsing agent analyzes the surface structure of the sentence in said language using a grammatical rule based on the estimated posture and role, and analyzes the possible subordinate relationship of the constituent parts of the said sentence.

The language analysis device of claim 24, wherein the two-way if the parsing means performs the analysis of the ranking with preference over others if the ranking is present in the sense in said language.

26. A language analysis device according to claim 24, wherein the first parser distinguishes the properties of the sentence in the given language with respect to its shape and meaning and estimates its adjective expression based on the distinguished properties.

The language analyzer according to claim 24, wherein the dialing language is the English language and said glossary memory is provided with word list data, distinguishing information distinguishing predetermined words containing "let's and let us", and said first parser, if searching the said glossary memory to obtain the said distinguishing information, exclude "let's" from the purpose of the analysis in the second parser and "let us" exclude from the purpose of the analysis in said second parser if punctuation is present in the previous section.

The language analysis device of claim 27, wherein the first parsing means estimates the attitude as an instruction and the role as an invitation to the ranking contained in the excluded portion.

29. A language analyzer according to claim 24, wherein the first parser, if no glossary data is obtained in the glossary memory search for a number of hyphenated words, searches the glossary for each of the number of words, estimates all words, consider a configuration, and estimate the attitude of said configuration as an adjective group.

30. A language analysis device according to claim 24, wherein the predetermined language is the English language and the first parser distinguishes the properties of the English sentence in its form, distinguishes the additive interrogator based on said distinguished property, a 35 estimates for the whole of the said additive interrogation group considered as a configuration whose attitude is that of an additive interrogation group and excludes it from the analysis in the second parser. ********* 8702359