Embodiment
At first, come the machine translation method based on semanteme of the present invention is described with reference to Fig. 1.Fig. 1 is the process flow diagram of the machine translation method based on semanteme according to an embodiment of the invention.Machine translation method of the present invention is different from prior art, is based on semantic.For a better understanding of the present invention, once explain with regard to some terms that the present invention relates to and notion below:
In natural language, the unit of expressing a meaning is called as semantic primitive, for example, and " slip-stick artist ".
A unit (being semantic primitive) of expressing a meaning in any concrete natural language (English for example, Chinese, etc.) is called the semantic element representation of this semantic primitive in this concrete natural language.For example, slip-stick artist's Chinese represents it is " slip-stick artist ", and English represents it is " engineer ".
The semanteme of a sentence of the natural language that any one is concrete, we claim that it is a sentence justice, for example, " I am the student ".Sentence justice is to be made of semantic primitive.For example sentence justice " I am the student " is by me, learns
Give birth to, be
The academic title(<who 〉,<what academic title 〉) these three semantic primitives formations.Wherein,<who〉and<what academic title〉be two parameters, it is a semantic primitive that each parameter need be replaced by.
There is the parameter of the semantic primitive of parameter to replace, promptly carries out substitution with semantic primitive.Semantic primitive after the replacement becomes semantic primitive substitution formula.Semantic primitive substitution formula is a compound semantic primitive.Sentence justice can be written as the adopted expression formula of sentence, i.e. semantic primitive substitution formula that parameter all is replaced, for example: be
The academic title(I learn
Give birth to).
The sentence of a concrete natural language is the sentence justice expression of a sentence justice in this concrete natural language.For example, " be
The academic title(I learn
Give birth to) " Chinese represent it is " I am the student ", English represents it is " I am a student ".
Semantic language is to be made of whole semantic primitives.Semantic language is unified, and is irrelevant with concrete languages.Concrete natural language is exactly that whole semantic element representations by this concrete natural language constitute.A concrete natural language can be regarded an expression of semantic language as.
Between the different concrete natural languages, why can translate each other, use the people of different concrete natural languages why can exchange, exactly because of the semantic element representation that has between the different concrete natural languages corresponding to identical semantic primitive, sentence corresponding to identical sentence justice is arranged, perhaps can set up one group of sentence of expressing this semantic primitive or sentence justice.
The nearly 4000 kinds of language in the whole world comprise English, Chinese, Japanese, German etc.They all are that the difference that can regard unified semantic language as is represented.
As shown in Figure 1, according to this embodiment of the invention, at first in step 100, extract original text one.Original text herein is meant will be to the article of its original languages of translating.In the process of carrying out mechanical translation, earlier the original text that needs to translate is input in the computing machine, can be by several different methods of the prior art, obtain from other computing machine as keyboard input, scanning identification or by network.The original text generalized case of these inputs can be entire chapter or one section article, therefore needs at first to extract wherein one.
In step 200,, this sentence is carried out semantic analysis then, thereby obtain this adopted expression formula of sentence according to the semantic element representation storehouse.Then in step 300, according to the semantic element representation storehouse, with of the represent expansion of this adopted expression formula with the purpose languages.In step 400, the sentence after launching is exported as translation at last.Below in conjunction with accompanying drawing 2 to 4, the machine translation method of embodiments of the invention is described in detail.
Fig. 2 A and 2B are the examples in the semantic element representation storehouse that relates to of the embodiment of the invention.The semantic element representation storehouse is the data acquisition of the semantic expressiveness of one or more natural languages of record.
Wherein, Fig. 2 A is the semantic primitive and the content in the semantic element representation storehouse thereof of this example, comprises each linguistic naturally semantic element representation.Shown in Fig. 2 A, the semantic element representation storehouse comprises following field: semantic primitive ID, is used for semantic primitive of unique identification, generally can represent with a sequence number or other numerical value that can not repeat or character string, can certainly so just needn't store it with capable number; Number of parameters and type are used to put down in writing number of parameters and the type that this semantic primitive comprises; The Chinese of semantic primitive represents, the Chinese that is used to write down corresponding semantic primitive is represented; The English of semantic primitive represents, the Chinese that is used to write down corresponding semantic primitive is represented; The Japanese of semantic primitive represents, the Japanese that is used to write down corresponding semantic primitive is represented.
Fig. 2 B lists semantic primitive and the corresponding convenient various literary styles of remembering thereof.
We can see from the example of Fig. 2 A and Fig. 2 B, and the semantic element representation storehouse is actually the database of a record semantic primitive, utilizes major key or semantic primitive ID that the expression of the different language of semantic primitive is mapped.Should be appreciated that can also there be other variations in the semantic element representation storehouse, for example: the semantic expressiveness of languages of semantic primitive can be recorded in the independent table, the table that will write down a plurality of languages semantic expressivenesses again is mapped with major key or external bond; And, can also comprise other field, as the field of the corresponding relation of the attribute of the parameter of semantic primitive and parameter.
Fig. 3 is according to an embodiment of the invention based on the detail flowchart of semantic analysis step in the machine translation method of semanteme.As shown in Figure 3, at first in step 201, from the semantic element representation storehouse, find out the semantic element representation and the corresponding semantic primitive thereof that all in the sentence of original text, are complementary.In the prior art, multiple matching process is arranged, for example artificial intelligence laterally preferential, the vertical various searching methods of priority scheduling commonly used can be realized this step.Then in step 202, the semantic element representation that parameterless semantic element representation or parameter in this sentence have been replaced carries out the unit replacement with corresponding semantic primitive.In step 203, judge whether that whole semantic element representations are replaced, if judged result is that then repeating step 202.Because in practical language, syntactic units is nested multilayer often, therefore needs to repeat above-mentioned steps 202 and 203.For being,, form the adopted expression formula of sentence up to the judged result of step 203 then in step 204.
The process of above-mentioned semantic analysis is described below in conjunction with concrete example sentence.
Suppose that original text is the sentence " Mr. Chen is the slip-stick artist " of a Chinese.Its semantic analysis process is as follows:
Mr. Chen be the slip-stick artist →
Old (3)Mr. is
The worker Cheng Shi (4)→
Earlier Give birth to (old) (2)Be the worker
Cheng Shi→
Be Present academic title (earlier Give birth to (old), the worker Cheng Shi ) (1)
As implied above, at first " old " and " slip-stick artist " replaced with semantic primitive ID and is 3 and 4 semantic primitive; To replace with the semantic element representation " sir " of parameter then, because its parameter " old " has been replaced; Replace the semantic element representation "Yes" of band parameter at last, because two parameters of this semantic primitive all have been replaced.Finally " be
Present academic title(earlier
Give birth to(old), the worker
Cheng Shi) " be the adopted expression formula of sentence of this sentence.It is to be noted, in the above description we be usefulness be the literary style of the convenient memory of semantic primitive, in fact semantic primitive can be replaced by and is fit to the label symbol that computing machine reads, for example semantic primitive ID in computing machine, therefore above-mentioned sentence adopted expression formula can be 1 (2 (3), 4).
Similarly, as follows for the example of the analytic process of the sentence of English and Japanese:
Mr.Chen?is?an?Engineer→Mr.
Chen(3)is?an? e
ngineer(4) →
Mr.(Chen)(2)is?an?e
ngineer→ I
sTP(M
r.(Chen),e
ngineer)(1) ;
Old さ ん は technician In The →
Oldさ ん は
Skill The teacherThe In The →
さ ん (old)The は skill
The teacherThe In The →
In The (さ ん (old), skill The teacher )
As can be seen, for the sentence of the different language of expressing the same meaning, its final adopted expression formula of sentence is: 1 (2 (3), 4) from above these examples.
Be to be understood that to also have many other methods to realize above-mentioned semantic analysis, obtain the adopted expression formula of sentence.
Fig. 4 is according to an embodiment of the invention based on the detail flowchart of semantic deployment step in the machine translation method of semanteme.As shown in Figure 4, at first in step 301, sequential scanning sentence justice expression formula is read the semantic primitive that first does not launch as yet.Then, find out the semantic element representation of purpose languages from the semantic element representation storehouse, and launch according to this semantic element representation in step 302.In step 303, judge whether that whole semantic primitives all have been unfolded then.If all launched,, obtain the translation of purpose languages then in step 304; Otherwise, repeated execution of steps 301 to 303.
The process that plain language justice is launched below in conjunction with concrete example sentence.
Suppose that the adopted expression formula of sentence that needs to launch is: 1 (2 (3), 4) promptly are
Present academic title(earlier
Give birth to(old), the worker
Cheng Shi)=I
STP(M
r. (C
Hen), e
Ngineerr)=In The (さ ん (old), skill
The teacher)
This justice expression is that the expansion process on Chinese is as follows:
Be
Present academic title(earlier
Give birth to(old), the worker
Cheng Shi) → earlier
Give birth to(old) is the worker
Cheng Shi→ Mr. Chen is the worker
Cheng Shi→ Mr. Chen is the worker
Cheng Shi→ Mr. Chen is the slip-stick artist
As implied above, sequential scanning sentence justice expression formula at first, finding first semantic primitive " is (X
1, X
2) " its semantic element representation according to Chinese is launched, that is: the centre is a "Yes", two parameters are " earlier
Give birth to(old) " and " worker
Cheng Shi" lay respectively at the both sides of "Yes".Then, " earlier with semantic primitive
Give birth to(X) " semantic element representation according to Chinese launches, that is: the semantic primitive " old " as parameter is positioned at " sir " front.Then successively with semantic primitive " old " and " worker
Cheng Shi" launch according to the semantic element representation of Chinese, finally just obtained Chinese translation " Mr. Chen is the slip-stick artist ".
Similarly, as follows for this justice at the example of the expansion process of English and Japanese:
I
sTP(M
r.(C
hen),e
ngineer)=>M
r.(C
hen)is?a?e
ngineer=>Mr.C
hen?is?an?engineer
=>Mr.Chen?is?an?engineer
The て The(さ
ん(old), skill
The teacherThe さ of)=>
ん(old) は skill
Teacher て The=>old さ ん は technician
The て The=>
Old さ ん は technician
The て The
Be to be understood that, above-mentioned concrete deployment step can have multiple variation, for example: can not be putting in order but the order of similar semantic when analyzing according to semantic primitive in the sentence justice expression formula, the semantic primitive that earlier will be not be unfolded with parameter or parameter is launched, and recursively launches all semantic primitives then layer by layer.
By the above description of this invention as can be seen, by original text being converted to the adopted expression formula of sentence, machine translation method of the present invention can be finished the translation of a plurality of purpose languages translations simultaneously, as long as the semantic element representation of corresponding languages is arranged in the semantic element representation storehouse.
According to another embodiment of the invention, after the sentence with original text is converted to the adopted expression formula of sentence, also comprise this adopted expression formula is kept at step in the memory device, after knowing that a paragraph of original text or whole statements are converted or preserve, expand into the statement of target language more desirably.In other words, at first original text is carried out semantic analysis, the adopted expression formula set of the sentence that forms is preserved, as required languages launch to become natural language in needs.
Fig. 5 is the block schematic diagram of the machine translation system based on semanteme according to an embodiment of the invention.As shown in Figure 5, the machine translation system of this embodiment of the invention comprises: original text storer 501 is used to preserve original text to be translated; Semantic element representation storehouse 506 is used to write down the semantic expressiveness of two or more languages of semantic primitive correspondence; Semantic analyzer 504 is used for the semantic element representation according to the original languages of the semantic primitive of semantic element representation storehouse 506 record, with the conversion of the statement analysis in the original text adopted expression formula that forms a complete sentence; The adopted expression formula storer 502 of sentence is used to preserve the adopted expression formula of sentence after semantic analyzer 504 is analyzed conversion; Semantic spreader 505 is used for the semantic expressiveness according to the purpose languages of semantic element representation storehouse 506 records, sentence justice expression formula is expanded into the statement of purpose languages; Translation output unit 503, the statement that is used for purpose languages that semantic spreader 505 is launched is exported as translation.
It will be understood by those skilled in the art that above-mentioned machine translation system can be that computing machine or other have the computing equipment of processing power.This computing equipment should comprise: processor, storer and corresponding input-output device.And the ingredient in the above-mentioned machine translation system can be realized by hardware or form of software.Certainly, the user can use it by network, also can utilize it to help user search, reading or translates online information.
In addition, well-known, by being provided, the system or the device that have the recording medium that has write down the software program code that can realize the previous embodiment function just can realize purpose of the present invention.Described program code can be read by computing machine, and the program that computing machine (or CPU, or MPU) in system and the device is read be stored in the recording medium and according to the program code fill order.In this case, the program code read from recording medium is realized the function of previous embodiment, and the recording medium that has wherein write down this program code has constituted the present invention.Be used for the logging program code or can make disk (as floppy disk or hard disk), CD or any Nonvolatile memory card such as the recording medium of variable data of table.
More than by specific embodiments of the invention principle of the present invention, feature and advantage are described.Be to be understood that the present invention is not limited only to above-mentioned specific embodiment, multiple variation can also be arranged, and concrete implementation step also can be had any different.Protection scope of the present invention is only defined by the appended claims.