CN1169555A - Computor input method of limited-semateme encoding of different natural language - Google Patents

Computor input method of limited-semateme encoding of different natural language Download PDF

Info

Publication number
CN1169555A
CN1169555A CN 96107009 CN96107009A CN1169555A CN 1169555 A CN1169555 A CN 1169555A CN 96107009 CN96107009 CN 96107009 CN 96107009 A CN96107009 A CN 96107009A CN 1169555 A CN1169555 A CN 1169555A
Authority
CN
China
Prior art keywords
semantic
natural language
coding
notion
concept
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 96107009
Other languages
Chinese (zh)
Inventor
刘莎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN 96107009 priority Critical patent/CN1169555A/en
Priority to AU33336/97A priority patent/AU3333697A/en
Priority to PCT/CN1997/000069 priority patent/WO1998000773A1/en
Publication of CN1169555A publication Critical patent/CN1169555A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Machine Translation (AREA)

Abstract

The present invention replates to a computer word information processing system, in particular, it is a unified code input method using limited natural language to develop machine-translation system. For vocabulary layer, its unified semantic coding inlcudes: establishing community concept system formed by using basic word graphic code and thesaurus extracode and using the community concept to make semantic limited unified coding for concept of different language. The unified semantic cading for its syntactic layer is characterized by that it converts the group sentence concept system into man-machine common-viewed stereo positioned group sentence frame, uses the space zoned bit to express syntactic concept in the group sentechce frame, and according to its syntactic sentence frame, and uses it as syntactic code.

Description

The computer input method of the semantic limited Unified coding of different natural languages
The present invention relates to a kind of computer character information disposal system, relate to a kind of computer input method of semantic limited Unified coding of different natural language mechanical translation or rather.
For a long time, done a large amount of Computer Processing work at the intercommunication problem on obstacle of spoken and written languages between the different nationalities, traditional thinking is with logic of natural languageization, uses computing machine to finish translation again.
Have a large amount of rules, rule and probabilistic knowledge in the natural language and can supply Computer Processing, as all parts that can memorize mechanically for human brain in the natural language expressing form, vocabulary, part of speech collocation and semantic matches relation that all are without exception, all are by adding up the probabilistic linguistry that obtains to limited language material, these natural language informations all help machine translation system that it is carried out robotization to handle.
But, the content that is not suitable for Computer Processing is in a large number also arranged in the natural language.Because the relation of natural language letter symbol and its semantic content is to be in constantly to be created in the state that--circulation--admit, therefore the flexible phenomenon of new explanation, grammer and the character of the generation of neologisms, old speech be in forever dynamic among, the custom of Here it is natural language becomes characteristic; In addition, the semantic content that semantic symbol referred in the natural language does not have determinacy, uses with the same piece of writing of a kind of natural language explanation article as different experts, different versions can occur, the fuzzy behaviour of Here it is natural language; Moreover, understanding to the natural language semanteme has very strong dependency, all can cause different semantic understandings with non-language environment, different communication alliance, different knowledge background, the different mode of thinking as the context language environment with the same expression-form of a kind of natural languages, and be rational semantic understanding, the random character of Here it is natural language inherence.
In sum, natural language is again a kind of linguistic context is had sensitive dependence, infinitely has the system of certain Unpredictability from wound ability, language understanding behavior having aspect the semantic meaning representation form, and all this kind can not be computing machine " understanding " and " calculating " again.Therefore, the boundary problem since computing machine is had the ability in natural language translation, if mechanical translation continues in the face of pure natural language, mechanical translation can only be the reference of human translation all the time or assist so.
The computer input method that the purpose of this invention is to provide the semantic limited Unified coding of a kind of different natural language, promptly adopt limited natural language to come the development machines translation system, natural language is to be subjected on the understanding basis of machine capability boundary limitation in admitting mechanical translation, in order to adapt to the natural language processing ability of computing machine, earlier the relation between natural language sign format and the semantic content is carried out restricted agreement, and then limited semanteme carried out Unified coding, thereby improve the mechanical translation quality effectively, solve the intercommunication problem on obstacle of different natural languages.
The computer input method of the semantic limited Unified coding of different natural languages of the present invention comprises the unified semantic coding of lexical level and the unified semantic coding of sentence structure aspect, it is characterized in that:
The unified semantic coding of described lexical level comprises:
1) on statistics natural language multilingual synonym basis, sort out centre word and equal number primitive collection is done reference with semantic, set up the total concept system that the basic concept by preposition, conjunction, verb, pair speech, adjective and abstract each speech constitutes;
2) with each total basic concept by degree add, program subtracts, nearly justice, commendation, derogatory sense, spoken language, written word, slang, common saying, Chinese idiom are expanded and built up nearly adopted extracode;
3) will directly express the meaning graphical symbol as the fotmat code that has notion;
4) adopt the semantic stack of each natural language near synonym definition method that total notion is carried out uninterpreted definition mark;
5) the total notion of being made up of basic concept and nearly adopted extracode thereof is carried out Unified coding, coding form is made of the number and the nearly adopted extracode of the graphical symbol of expressing the meaning;
6) set up the contraposition relation of natural language notion and total notion and nearly adopted extracode, polysemant obtains a plurality of coding results;
7) setting up natural language notion and total notion contraposition when concerning, then basic concept is being defined description when no natural language equivalent occurring with this kind natural language as the basic concept in the total notion;
8) setting up natural language notion and total notion contraposition when concerning, do not having equivalent, then substituting or define description with this kind natural language with basic concept as the near adopted extracode in the total notion;
9) set up material object noun collection of illustrative plates and corresponding notion numerical coding, be used for the semantic differentiation of polysemy;
10) set up different natural language specialized vocabulary correspondence database;
11) natural language vocabulary that does not all obtain to encode by above step will not be encoded as redundancy concept;
The unified semantic coding of described sentence structure aspect comprises:
1) the sentence structure concept system is converted to the stereotactic sentence structure framework of man-machine co-recognition, express the sentence structure notion with the position of regional location in group sentence framework;
2) constitute nine sentence structure districts by vertical modified region, core space, additional area and horizontal subject district, predicate district, object district, each sentence structure district is divided into 3 * 3 lattice, each lattice number is its numbering with area code, lattice, with the total notion in each natural group of languages sentence, put respective area, the lattice that are arranged in group sentence framework by its sentence structure with express the meaning figure code and numerical coding form, obtain the syntactic information coding of semantic meaning representation.
The semantic limited Unified coding of different natural language of the present invention is a kind of mechanical translation public media language easy to identify, exercisable, can use for the people who holds different mother tongues.
Further specify technology of the present invention below in conjunction with embodiment and accompanying drawing
Fig. 1. total notion figure Code Design example schematic
Fig. 2. the semantic corresponding example schematic with figure code, number of time class
Fig. 3. natural language notion that basic speech " is laughed at " and total notion are to bitmap table
Fig. 4. the natural language notion of basic speech " beauty " and total notion are to bitmap table
Fig. 5. basic speech " if " the natural language notion and total notion to bitmap table
Fig. 6. the collection of illustrative plates synoptic diagram in kind of multilingual noun in kind
Fig. 7. sentence structure framed structure synoptic diagram
Fig. 8. sentence structure framework simple sentence use-case
Fig. 9 to Figure 12.Sentence structure framework complex sentence use-case
Figure 13. limited semantic Unified coding machine translation system logic diagram
The semantic limited Unified coding of different natural language of the present invention at first is the word of the common word meaning of any languages Remittance by dictionary definition, signify, drill that the method such as translate, metaphor, sign language are borrowed the label method does that gestalt is processed and one by one Design image code, as general with having of " walking " of describing the design of definition method, " phone " among Fig. 1 Read figure code; Total concept figure code with " laziness " of explaining the design of definition method; Lift with the typical case The total concept figure code of " abstract " of the design of example definition method; With " height " of the design of contrast definition method, The total concept figure code of " short "; Total concept figure code with " peace " of symbology design; With " diplomacy ", " country " of drilling the translation design, the total concept figure code of " contacts "; With than The total concept figure code of " nobleness " of the design of analogy method; With sign language reference method design " because ", So " total concept figure code. Wherein describing the definition method is to the general of definite naturally typing feature arranged Thought adopts; The concept that can obtain its image characteristics by dictionary definition is adopted the definition method of explaining, as Lazy dictionary lexical or textual analysis is " not wanting to work "; The image that can not directly obtain concept from explain definition is special When levying, can adopt typical case method, be " to extract this from various affairs such as the dictionary lexical or textual analysis of " abstract " Matter feature "; The concept that opposite meaning is arranged is obtained visual expression such as " height ", " short " with the contrast definition; When the iconicity of dictionary lexical or textual analysis is very poor, can adopt the symbology of common symbol image, such as " peace " The dictionary lexical or textual analysis is " state that does not have war ", the ideograph that it is common as if holding in the mouth the dove of peace of the olive branch; Deduction is to be deduced and the encoding of graphs that comes by the encoding of graphs of other concepts, as " diplomacy " by country and Contacts are deduced and are formed; Analogy is to obtain metaphor to express from meaning of word association, releases such as the dictionary of " nobleness " Justice is " moral level height ... "; Also satisfactory reason image method of method is used for reference in sign language, and some represent abstract concept Vocabulary can use for reference the expression of sign language.
The concept figure code has been arranged, and the figure code that will have again the similar function meaning of a word is arranged with the coordinate matrix form Cloth and form class figure, each figure code in length and breadth coordinate number of present position in class figure is its class code; With In the one class code figure code of the different meaning of a word by page or leaf with the coordinate matrix form formation bitmap of arranging, each figure Code in length and breadth coordinate number of present position in one page class figure is its bit code; Class code, the page number, bit code order Consist of 5 isometric numbers of arbitrary figure code, number is defined on the keyboard digital key. As can by the time, Orientation, quantity, refer to, association, grammer, raw material (animal, plant, synthetic material), thing Reason motion, human motion, meteorology, physics, person, life, social activity, psychology, thinking, traffic, Communication, finance, trade, economy, tourism, diet, amusement, medical treatment, shopping, administration, culture and education, Classifying earlier in the aspects such as science and technology, politics, military affairs, sets up class figure and class code, then each class pressed page or leaf Set up bitmap and bit code, Fig. 2 illustrates time sector of breakdown concept figure code and number thereof, comprise year, The moon, week, day, hour, minute, second, daytime, morning, the morning, noon, afternoon, at dusk, Night, time, the day before yesterday, yesterday, today, tomorrow, the day after tomorrow, past, present, future, season, Spring, autumn, summer, winter, period, early stage, mid-term, late period, epoch, dynasty, Christian era, century, year Generation, ancient times, modern age, modern times, the present age etc.
The total concept system of lexical level of setting up of the present invention is by statistics natural language different language synonym Be the basis, sort out centre word and respective numbers primitive collection compares and screening obtains with reference to natural language is semantic , embodiment must have " the vertical language of arc " from population distribution reality and the total concept of using natural language The current demand of feature is set out, and is former with English, Chinese Synonyms, semantic classification centre word and respective numbers Language collection compares, screens, and forms a total conceptual foundation system, through preliminary experiment and being total to of obtaining Having has 4,000 left and right sides basic concepts approximately in the concept system, comprise more than 100 preposition, conjunction, more than 700 Verb is paid word, adjective for more than 1,000, and all the other are mainly abstract noun.
In order to guarantee " enough property " of total concept, also need carrying out English, Chinese vocabulary meaning item to total concept On the transition experiment basis total concept is carried out necessary adjustment. But can infer, with total concept system The natural language quantity of setting up corresponding relation always tends towards stability, and must lead because total concept is too enriched Cause the with it ability decline of correspondence of each natural language concept system, and one of every increase " non-nature " Concept (concept that does not have clear and definite symbolic formulation form in certain natural language) must cause the user to learn The habit cost increases, and therefore easily learns the mutual restricting relation between functional principle and the enough principles, the objective upper limit Made the free degree of the total concept of Choice and design. In the dictionary mark, have 16 justice such as Chinese " good " , remove usage frequency and cross low and non-standard concept part, only select 50% the senses of a dictionary entry to enter the concept correspondence The storehouse.
In order to reflect as far as possible the rich of natural language conceptualization form, can be total basic concept and set up Degree, praise, demote, the nearly adopted extracodes such as mouth, book, valgar, popular, one-tenth, strengthen total concept system to from The coding disposal ability of right language vocabulary.
The present invention adopts the graphical symbol of directly expressing the meaning as total ideational form code, has not both had direct employing nature The spoken and written languages symbol does not have to adopt yet and takes out as the problem that total ideational form code causes semanteme to be obscured easily Resemble symbol and cause the too high problem of user learning cost.
Be difficult to break away from natural language understanding at random owing to natural language total concept is carried out semantic description The property, thus the present invention total concept code is carried out semantic tagger is to adopt the natural language near synonym semantic folded Add the definition method, total concept is carried out uninterpreted definition mark, simultaneously the uninterpreted definition mark of near synonym method Also set up bitmap table for multiple natural language concept and total concept, and for ease of the computer place Reason is also unified digital coding to total concept (comprising basic concept and nearly adopted extracode thereof), lists in To in the bitmap table.
We can obtain the total conceptual system of lexical level semantic meaning representation by above step, in this system In, each total concept by the semanteme of different natural language near synonym stack definition method " self-evident " its Semantic meaning, the total concept of each semantic meaning " self-evident " have the figure code of expressing the meaning to reach only One digital coding.
Fig. 3, Fig. 4, Fig. 5 illustrate respectively that basic word " is laughed at ", " beauty ", " if " (Chinese) The natural language concept of " langh " " beautiful ", " if " (English) and the contraposition of total concept Chart. Middle row provide mediating language concept figure code, number and by degree subtract, degree adds, closely The near adopted extracode (number that justice, Chinese idiom, written word, spoken language, slang, common saying, commendation, derogatory sense form After add a, b ... j) forms semantic stack definition. Wherein in the nearly justice group of the Chinese of Fig. 3, list " little Laugh at " (degree subtracts), " laughingly " (degree adds), " covered with smiles " (Chinese idiom) and sneer (demote Justice), in the nearly justice group of English, list " smile " (degree subtracts), " grin and cackl ") nearly justice and " sneer " (derogatory sense) also can be listed the nearly justice group of Russian, the nearly justice group of Japanese etc. by principle of identity. At figure List among the nearly justice group of 4 Chinese " beautiful " (degree subtracts), " beautiful " " grace " (nearly justice), Too beautiful to be absorbed all at once (Chinese idiom), " good-looking " (spoken language) and " enchanting " (derogatory sense), closely adopted at English List " pretty " (degree subtracts), " handsome " " daity " (nearly justice), " personable " among the group (spoken language) and " tawdry " (derogatory sense). In the nearly justice group of the Chinese of Fig. 5, list " if " (near Justice), " if " " if " (written word), " just in case ", " if " (spoken language), " will Be " (common saying), in the nearly justice group of English, list " in case " (nearly justice) and " in the event of " (written word).
When the present invention carries out the Uniform semantic coding to different natural language vocabulary aspect concept systems, can be undertaken by following several respects:
A. the various natural language vocabulary senses of a dictionary entry and total concept are set up the contraposition relation, are made the polysemant in the natural language obtain a plurality of codings, as a plurality of senses of a dictionary entry of Chinese " good " can comprise "
(1) relative with " bad ", its near synonym group have " excellent, smart, good, wonderful, outstanding, get home ..., "
(2) friendliness, its near synonym group has " friendly, friendly, harmonious, congenial ... ",
(3) health, its near synonym group has " solid, strong, sound, hale and hearty ... ",
Agree that (4) its near synonym group has " approval, approval, passable ... ",
(5) easy, its near synonym group has " easily, easy, convenient, simple and easy ... ",
(6) be convenient to, its near synonym group have " make things convenient for, save trouble, light, convenient ... ",
(7) very, its near synonym group has " especially, especially, exceptionally, very much, very ... ",
(8) like, its near synonym group has " like, like, like, like, like ... ";
B. class noun in kind can be set up multilingual noun corresponding relation database in kind by collection of illustrative plates in kind, differentiates so that make polysemy qi justice, referring to Fig. 6.
When polysemy appears in natural language vocabulary, adjustablely go out collection of illustrative plates and carry out semantic selection for the user." flower " of Chinese for example, its noun (phanerogamous generative propagation organ), verb (connotation of cost, usefulness), adjective (expression color or kind the are intricate) implication of having nothing in common with each other, when machine translation system is difficult to differentiate user-selected semanteme, " flower " (51) in the collection of illustrative plates in kind can be called in computer interface and do semantic differentiation, to improve the speed that user semantic is differentiated for the user.
C. specialized vocabulary can be set up multilingual specialized vocabulary correspondence database;
D. when the lexical level notion of setting up certain natural language and total contraposition concern, if total basic concept does not have equivalent, then total basic concept is defined description with this kind natural language, if nearly adopted extracode does not have equivalent, available this kind natural language defines description to total basic concept, also available basic concept carries out the room and substitutes, as shown in Fig. 3 to Fig. 5;
E. promptly will not encode as if the natural language notion (the vocabulary senses of a dictionary entry) of still failing by the a--d method to have concept system to set up the contraposition relation together as redundancy concept.By the a--e step, can make the lexical level notion of multiple natural language obtain limited unified semantic coding, all vocabulary senses of a dictionary entry that is encoded have all had clear and definite semantic convention.
The unified semantic coding of sentence structure aspect of the present invention, the i.e. visual expression of morphology aspect association of ideas relation.Be the requirement of satisfying mechanical translation, the natural language processing ability that adapts to computer technology, must rise to the top layer to the recessive sentence structure notion of natural language and then give the unified concept coding.
In existing machine translation mothod, the net result of syntactic analysis can be expressed as a syntax tree usually, and the grammar concept that this syntax tree is involved be we can say the syntactic information that must obtain exactly from the language of source when carrying out mechanical translation.In order to make natural language restricted version of the present invention reach easy practical standard, and the sentence structure concept system that " syntax tree " is expressed converts a sterically defined sentence structure framework to, and in this sentence structure framework, regional location promptly is the visual expression of sentence structure notion.
Sentence structure framework shown in Figure 7 (group sentence a framework) comprising: (1) to (9) is totally nine districts, and every district is made up of-1 to-9 nine lattice, and solid box is for distinguishing among the figure, and frame of broken lines is lattice, and the numbering of every lattice adds lattice by area code and number forms, as the 1-2 that is encoded to of sub-district-2 lattice.(1), (2), (3) district is respectively core subject and predicate, object district, (4), (5), (6) district are respectively the modified region in (1), (2), (3) district, (7), (8), (9) are respectively the additional area in (1), (2), (3) district.
Below in conjunction with the make introductions all round service regeulations embodiment of group sentence framework of Fig. 7.
1. (1), (3) district sash have identical service regeulations, comprising:
A.-1 lattice that are positioned at this district put respectively in the single subject term and guest's (table) speech that serve as the sentence nucleus;
B. coordination subject term, predicate are laterally put-1 lattice that are positioned at this district respectively;
C. subject term and predicate arranged side by side are put-4 ,-5 ,-6 ,-3 ,-9 lattice that are positioned at this district respectively in regular turn side by side, and subject term arranged side by side more than 5 and predicate also discharge the position in regular turn in-9 lattice;
D. coordination subject term arranged side by side, predicate put the position a same c;
E. the subject and predicate (being) among subject clause or the object clause, guest's (table) are put respectively and are positioned at this district-1 ,-2 ,-3 lattice;
When f. occurring in subject, the object being willing to interlock predicate, predicate arranged side by side, and discharging is positioned at this district-2 lattice;
G. the direct object in the double objects is put the same a of method for position, f rule, and indirect object is put (8) district.
2. (2) district service regeulations comprise:
A. the core predicate in the core predicate district is put and is positioned at this district-2 lattice;
B. the predicate of being willing to that can be willing to the interlock predicate is placed on this district-1 lattice, and the core predicate is put this district-2 lattice;
C. predicate arranged side by side is put and is positioned at this district-2 ,-3 lattice;
D. surpass the horizontal also discharging of plural predicate arranged side by side and be positioned at this district-3 lattice;
E. represent the special map sign indicating number of core predicate tense, figure, voice past tense, future tense, carry out body, perfect aspect, modus tollens, dynamically put-4 ,-7 ,-6 ,-9 ,-5 ,-8 lattice that are positioned at this district, reach tense with the special map code table.
3. (4)--(6) district service regeulations are ornamental equivalent to be placed on and to be modified in the modified region that branchs go together, and must be placed on (4) as the ornamental equivalent in core subject (1) district and distinguish; As qualifier is arranged respectively, then qualifier must be gone together with modificand when subject term arranged side by side was arranged.
4. can comprise for the modified region service regeulations of freely selecting for use:
A. simply modify
A. list-modification, the arranged side by side modification are placed on (4) district-7 ,-8 ,-9 lattice;
B. degree, quantity+single modification, degree, numeral-classifier compound are preceding, and after single being modified at, apposition is put the position;
C. single modification is to a plurality of branches that are modified to, and qualifier is corresponding with first modificand.
B. subject and predicate, guest modify
A. simple subject and predicate, guest's structural modification can be put-4 ,-5 ,-6 ,-1 ,-2 ,-3 ,-7 ,-8 ,-9 lattice that are positioned at respective area respectively;
B. contain and can be willing to interlock, side by side subject and predicate, guest's structural modification of predicate, can be willing to interlock, side by side predicate side by side apposition put-1 ,-2 ,-3 lattice that are positioned at respective area;
C. the subject and predicate, the guest that contain ornamental equivalent are modified to the main plot and modify, and when the main plot was modified, at respective area-1 lattice input #, subject and predicate, guest were put and be positioned at respective area-5 ,-2 ,-8 lattice then; Ornamental equivalent is put and is positioned at respective area-4 ,-1 ,-7 lattice; Complementary element is put respective area-6 ,-3 ,-9 lattice; If put into other information in the modified region, also need adopt the main plot to modify, then insert the main plot, promptly puts in being inserted into sash and insert symbol *And area code, insert area code and tightly limit *Afterwards, put the position in the insertion district by the main plot modification then.
C. multiple modification
A.-4 ,-1 ,-7 ,-5 ,-2 ,-8 ,-6 ,-3 ,-9 lattice that are positioned at respective area are put in simple multiple modification;
B. contain in the multiple modification of subject and predicate, guest's structure, its subject and predicate, guest partly adopt insertion.
D. the guest that is situated between modifies
A. the guest's structure that simply is situated between guest Jie apposition;
B. the guest's Jie structure preposition and subject term or the predicate apposition that contain subject and predicate, guest;
C. before the guest's Jie structure preposition that contains multiple modification is put and is positioned at the first rebuilding excuse, and put the position with the first rebuilding excuse apposition;
E. modify side by side, A, B, C class method of modifying can use side by side, and its paralleling method comprises:
A. corresponding one by one;
B. corresponding more than one;
C. many-one correspondence;
D. turnover is modified side by side, and last the qualifier apposition before adversative and the turnover is put the position.
5. other rules comprise:
A. (6) district is basic identical, and difference is simply, additional putting is positioned at-4 ,-5 ,-6 lattice side by side for additional area (7)--(9) put position rule and (4)--;
B. predicate temporal expression method is for to bring core space tense sign indicating number into the predicate lattice;
C. negate to put the position to the simple of verb, qualifier with verb, qualifier apposition;
D. after same lattice and the preceding amount of number put in the quantity phrase;
E. closely the phrase apposition is put the position;
F. abridged subject and predicate, predicate in natural language must be mended into corresponding sash during the group sentence;
G. the conjunction between the essential sentence is put and is positioned at special-purpose lattice;
H. represent that query, statement, exclamation, Qi make the special use symbol of sentence?,! Put into the punctuate lattice;
I. name, place name, onomatopoeia are with pinyin representation and add with quotation marks " ".
Lattice in the sentence structure framework can be used as window and are opened, and the district can be by in nested " window " that advances to be opened with whole framework, and the tense figure code in core predicate district can repeatedly be called in other positions that are placed with verb on the interface.
Fig. 8 illustrates simple sentence " we always like some strong sportsmen " and put a situation in group sentence framework, and its coded digital formal representation result is: 11 §, 11512,22 §, 12214,31 §, 01617,67 §, 10316,68 § 21313.Wherein § represents " putting in the lattice ", and the two digits before the § is represented lattice number, and the five digit number behind the § is the figure code (letter that a nearly adopted extracode also should be arranged) of directly expressing the meaning.
Fig. 9 to Figure 12 illustrates complex sentence " even being that the half-hearted any country that is concerned about that this country develops can utilize the obedient again labour of its cheapness to come to produce goods and realize that domestic economy increases fast for this world market ", a result of putting of putting a rules enforcement by above-mentioned group of sentence in organizing the sentence framework.The 4th width of cloth (Figure 12) group sentence view picture (4) district of inserting the 1st width of cloth (Fig. 9) as a result wherein *Number locate as the subject attribute, the numeral expression form of its grammatical tagging is for to connect the 4th width of cloth coding result at 48 § (7).
Figure 13 illustrates the logic diagram of implementing the machine translation system when of the present invention, because the unified semantic coding scheme of different natural languages of the present invention, semanteme and expression-form from vocabulary to the sentence structure aspect to natural language have all been made comprehensive agreement, therefore the user can directly adopt certain natural language to enter semantic input coding system, and system will adopt has only the natural language vocabulary senses of a dictionary entry that has been encoded figure code can occur on man-machine interface; Interface sentence structure framework can automatically or manually be called in the vocabulary that only obtains unique digital code; Only organize the method that sentence just gives mechanical translation, realize the semantic constraint field of natural language expressing according to group sentence framework service regeulations.If the user is encoded with the vocabulary end of natural language input, also can adopt the semantic classification method be encoded vocabulary retrieval and call.
Because the semantic meaning representation to natural language has carried out comprehensive agreement, therefore carrying out semantic meaning representation with direct employing natural language compares, though lost a part of sophistication, but increased substantially definition, and the result who carries out semantic meaning representation by semantic coded system can change to multiple natural language, therefore information output can be converted to group sentence result own mother tongue and carries out expression of results and verify, also can on the interface, directly make amendment if be unsatisfied with, simultaneously, as long as receiving party user understands this group sentence rule, when finding that the translation semanteme is not clear, also can call the group sentence result of information output side on the interface, utilize different mother tongues directly to carry out semantic query, obviously, this is providing strong assurance for machine translation system aspect reliability of semantic information transmission;
Unified semantic coding scheme of the present invention has solved the development problem of the most difficult natural language source language analysis phase in the machine translation system, its semantic meaning representation result's intermediate file form can be directly and the generation technique interface of multiple natural language, therefore development machines translation system on semantic coding system-based of the present invention, just can only set up the corresponding relation database (comprising collection of illustrative plates database in kind and specialized vocabulary database) of natural language vocabulary aspect concept system and unified semantic coding concept system and the technology (system shown in Figure 13) that exploitation generates to the natural language conversion, all natural language machine translation systems of developing in the semantic coding system can both be interconnected simultaneously, can shorten the construction cycle of machine translation system significantly, reduce development cost, improve the using value of machine translation system;
Has good complementary relationship between semantic coding system of the present invention and the existing natural language processing technique, its result who mutually combines, has complementary advantages has caused a kind of generation of well behaved multi-lingual intertranslation technology, can directly carry out semantic meaning representation such as the user by semantic coding scheme interface form, again it being generated the result automatically directly embeds in the translation of original machine translation system, both can bring into play the existing fireballing advantage of machine translation system, and make the quality of scientific and technical literature translation and the practicality of translation system obtain substantive the improvement again;
Semantic coding of the present invention system can make the most of the advantage on computer networking technology, only by a mechanical translation encoding process, with with a kind of coding form in the transmission of the enterprising lang of network justice information, each network terminal is decoded as multiple natural language according to user's needs again, helps the transmission efficiency of saving cyberspace, improving the network information and realizes that the popular GB of network information resource is shared.
Any domestic consumer with the above schooling of senior middle school just can freely control semantic coding of the present invention system through short-term training and remove to overcome different natural language communication disorders, although it does not resemble lively, fine and smooth, graceful the human translation, nature, the enough property of semantic meaning representation ability and the definition of semantic information transmission can have good assurance.

Claims (4)

1. the computer input method of the semantic limited Unified coding of different natural languages comprises the unified semantic coding of lexical level and the unified semantic coding of sentence structure aspect, it is characterized in that:
The unified semantic coding of described lexical level comprises:
1) on statistics natural language multilingual synonym basis, sort out centre word and equal number primitive collection is done reference with semantic, set up the total concept system that the basic concept by preposition, conjunction, verb, pair speech, adjective and abstract each speech constitutes;
2) with each total basic concept by degree add, program subtracts, nearly justice, commendation, derogatory sense, spoken language, written word, slang, common saying, Chinese idiom are expanded and built up nearly adopted extracode;
3) will directly express the meaning graphical symbol as the fotmat code that has notion;
4) adopt the semantic stack of each natural language near synonym definition method that total notion is carried out uninterpreted definition mark;
5) the total notion of being made up of basic concept and nearly adopted extracode thereof is carried out Unified coding, coding form is made of the number and the nearly adopted extracode of the graphical symbol of expressing the meaning;
6) set up the contraposition relation of natural language notion and total notion and nearly adopted extracode, polysemant obtains a plurality of coding results;
7) setting up natural language notion and total notion contraposition when concerning, then basic concept is being defined description when no natural language equivalent occurring with this kind natural language as the basic concept in the total notion;
8) setting up natural language notion and total notion contraposition when concerning, do not having equivalent, then substituting or define description with this kind natural language with basic concept as the near adopted extracode in the total notion;
9) set up material object noun collection of illustrative plates and corresponding notion numerical coding, be used for the semantic differentiation of polysemy;
10) set up different natural language specialized vocabulary correspondence database;
11) natural language vocabulary that does not all obtain to encode by above step will not be encoded as redundancy concept;
The unified semantic coding of described sentence structure aspect comprises:
1) the sentence structure concept system is converted to the stereotactic sentence structure framework of man-machine co-recognition, express the sentence structure notion with the position of regional location in group sentence framework;
2) constitute nine sentence structure districts by vertical modified region, core space, additional area and horizontal subject district, predicate district, object district, each sentence structure district is divided into 3 * 3 lattice, each lattice number is its numbering with area code, lattice, with the total notion in each natural group of languages sentence, put respective area, the lattice that are arranged in group sentence framework by its sentence structure with express the meaning figure code and numerical coding form, obtain the syntactic information coding of semantic meaning representation.
2. the computer input method of the semantic limited Unified coding of different natural languages according to claim 1, it is characterized in that: the described graphical symbol of directly expressing the meaning is with image code and arbitrarily the common speech meaning of languages is corresponding one by one, the generation of image code is by dictionary definition, signify, drill translate, metaphor, sign language are used for reference, contrast definition, typical case's definition for example, describe the gestalt processing that the definition method is done.
3. the computer input method of the semantic limited Unified coding of different natural languages according to claim 1, it is characterized in that: the described graphical symbol of will directly expressing the meaning is that the image code with similar function meaning of a word is arranged with the coordinate matrix form as the fotmat code that has notion, form class figure, each image code longitudinal and transverse coordinate number of present position in class figure is its class sign indicating number; With the coordinate matrix form formation bitmap of arranging, each image code longitudinal and transverse coordinate number of present position in one page class figure is a bit code to the image code of the different meaning of a word by page or leaf in the same class sign indicating number; Class sign indicating number, the page number, bit code constitute 5 isometric numbers of arbitrary image code in proper order, and number is defined on the numerical key of computer keyboard.
4. the computer input method of the semantic limited Unified coding of different natural languages according to claim 1, it is characterized in that: the lattice in the described sentence structure framework can be used as window and are opened, arbitrary district can be by in the nested window that advances to be opened with whole framework, and the tense figure code in core predicate district can repeatedly be called in other positions that are placed with verb on the interface.
CN 96107009 1996-07-02 1996-07-02 Computor input method of limited-semateme encoding of different natural language Pending CN1169555A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN 96107009 CN1169555A (en) 1996-07-02 1996-07-02 Computor input method of limited-semateme encoding of different natural language
AU33336/97A AU3333697A (en) 1996-07-02 1997-07-02 Computer input method of confined semantic unifying encoding for different natural languages and computer input system thereof
PCT/CN1997/000069 WO1998000773A1 (en) 1996-07-02 1997-07-02 Computer input method of confined semantic unifying encoding for different natural languages and computer input system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 96107009 CN1169555A (en) 1996-07-02 1996-07-02 Computor input method of limited-semateme encoding of different natural language

Publications (1)

Publication Number Publication Date
CN1169555A true CN1169555A (en) 1998-01-07

Family

ID=5119457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 96107009 Pending CN1169555A (en) 1996-07-02 1996-07-02 Computor input method of limited-semateme encoding of different natural language

Country Status (3)

Country Link
CN (1) CN1169555A (en)
AU (1) AU3333697A (en)
WO (1) WO1998000773A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102959904A (en) * 2010-06-16 2013-03-06 索尼移动通讯有限公司 User-based semantic metadata for text messages
CN103020042A (en) * 2011-09-22 2013-04-03 株式会社东芝 Machine translation apparatus and method of machine translation
CN112115722A (en) * 2020-09-10 2020-12-22 文化传信科技(澳门)有限公司 Human brain-simulated Chinese analysis method and intelligent interaction system
CN112214649A (en) * 2020-10-21 2021-01-12 北京航空航天大学 Distributed transaction solution system of temporal graph database

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111326139B (en) * 2020-03-10 2024-02-13 科大讯飞股份有限公司 Language identification method, device, equipment and storage medium
CN112507705B (en) * 2020-12-21 2023-11-14 北京百度网讯科技有限公司 Position code generation method and device and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CZ283469B6 (en) * 1992-06-02 1998-04-15 Sumitomo Chemical Company, Limited Aluminium {alpha}-oxide
CN1031228C (en) * 1993-02-24 1996-03-06 刘莎 Special purpose pocket calculator for social intercourse
CN1121597A (en) * 1994-10-24 1996-05-01 中国物资贸易发展总公司 pattern-code self-definition phonetic input method and electronic self-calling device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102959904A (en) * 2010-06-16 2013-03-06 索尼移动通讯有限公司 User-based semantic metadata for text messages
CN103020042A (en) * 2011-09-22 2013-04-03 株式会社东芝 Machine translation apparatus and method of machine translation
CN112115722A (en) * 2020-09-10 2020-12-22 文化传信科技(澳门)有限公司 Human brain-simulated Chinese analysis method and intelligent interaction system
CN112214649A (en) * 2020-10-21 2021-01-12 北京航空航天大学 Distributed transaction solution system of temporal graph database
CN112214649B (en) * 2020-10-21 2022-02-15 北京航空航天大学 Distributed transaction solution system of temporal graph database

Also Published As

Publication number Publication date
WO1998000773A1 (en) 1998-01-08
AU3333697A (en) 1998-01-21

Similar Documents

Publication Publication Date Title
CN1122231C (en) Method and system for computing semantic logical forms from syntax trees
Li et al. Chinese
Androutsopoulos et al. Generating natural language descriptions from OWL ontologies: the NaturalOWL system
Elhadad et al. Sentence alignment for monolingual comparable corpora
Gupta et al. Choosing linguistics over vision to describe images
Alani et al. Automatic ontology-based knowledge extraction from web documents
JP2003308320A (en) System for realizing sentence
CN1617134A (en) System for identifying paraphrases using machine translation techniques
CN102272755A (en) Method for semantic processing of natural language using graphical interlingua
Milićević A short guide to the Meaning-Text linguistic theory
CN1169555A (en) Computor input method of limited-semateme encoding of different natural language
Lee et al. Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources
KR100597435B1 (en) System and method for classfying question based on hybrid of information search and question answer system
CN114722829A (en) Automatic generation method of ancient poems based on language model
Kuicheu et al. Description logic based icons semantics: An Ontology for Icons
CN1111814C (en) Opening and alli-information template type of language translation method having man-machine dialogue function and all-information semanteme marking system
CN1088011A (en) The template proofreading method and the device of multi-lingual electronic manuscript
CN1595399A (en) Method for automatic indexing and searching word and word attributes in Chinese text
CN104866607A (en) Dongba character interpretation database building method
Al-Khrisat Structuring the Arabic lexicon and thesaurus with lexical-semantic relations to support information retrieval
CN1417707A (en) Natural language semantic information united-coding method
Dannélls Multilingual text generation from structured formal representations
CN1043016A (en) Holographic code for chinese characters
CN1395195A (en) Dongpa pictograph computer processing technology
Wolff Information compression, intelligence, computing, and mathematics

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C01 Deemed withdrawal of patent application (patent law 1993)
WD01 Invention patent application deemed withdrawn after publication